By Topic

Corpus annotation in inflectional languages: Czech

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

3 Author(s)
Pala, K. ; Fac. of Inf., Masaryk Univ., Brno, Czech Republic ; Rychly, P. ; Smrz, P.

We offer basic information about Czech grammatically annotated and fully disambiguated corpus DESAM and its structure. The system and its method of tagging and disambiguation is briefly described as well. Further, we deal with the tagset used in the annotation of DESAM and explain the way in which the tagset is structured to cope with a highly inflectional language such as Czech. We mention the tools used for its management, particularly a corpus query processor CQP. The main attention is paid to the examination of the relations between the size of the DESAM tagset and measures of ambiguity observed for particular tags. Also the reliability of tagging with regard to the inventory of tags is explored. Some considerations based on statistical techniques of disambiguation are presented

Published in:

Database and Expert Systems Applications, 1998. Proceedings. Ninth International Workshop on

Date of Conference:

25-28 Aug 1998