{"title":"Corpus annotation in inflectional languages: Czech","authors":"K. Pala, P. Rychlý, P. Smrz","doi":"10.1109/DEXA.1998.707395","DOIUrl":null,"url":null,"abstract":"We offer basic information about Czech grammatically annotated and fully disambiguated corpus DESAM and its structure. The system and its method of tagging and disambiguation is briefly described as well. Further, we deal with the tagset used in the annotation of DESAM and explain the way in which the tagset is structured to cope with a highly inflectional language such as Czech. We mention the tools used for its management, particularly a corpus query processor CQP. The main attention is paid to the examination of the relations between the size of the DESAM tagset and measures of ambiguity observed for particular tags. Also the reliability of tagging with regard to the inventory of tags is explored. Some considerations based on statistical techniques of disambiguation are presented.","PeriodicalId":194923,"journal":{"name":"Proceedings Ninth International Workshop on Database and Expert Systems Applications (Cat. No.98EX130)","volume":"134 5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1998-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings Ninth International Workshop on Database and Expert Systems Applications (Cat. No.98EX130)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DEXA.1998.707395","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
We offer basic information about Czech grammatically annotated and fully disambiguated corpus DESAM and its structure. The system and its method of tagging and disambiguation is briefly described as well. Further, we deal with the tagset used in the annotation of DESAM and explain the way in which the tagset is structured to cope with a highly inflectional language such as Czech. We mention the tools used for its management, particularly a corpus query processor CQP. The main attention is paid to the examination of the relations between the size of the DESAM tagset and measures of ambiguity observed for particular tags. Also the reliability of tagging with regard to the inventory of tags is explored. Some considerations based on statistical techniques of disambiguation are presented.