Anupam Pandey, Srishti Singh, Atul Kr. Ojha, Girish Nath Jha
{"title":"Challenges in annotation and domain adaptation in hindi POS tagger: With reference to cricket","authors":"Anupam Pandey, Srishti Singh, Atul Kr. Ojha, Girish Nath Jha","doi":"10.1109/icatcct.2017.8389124","DOIUrl":null,"url":null,"abstract":"In this paper, author reports the scope of domain adaptation for multi-domain Hindi POS tagger to a new domain, i.e. the popular domain of Cricket, through an initial experiment. Utility of Adaptation of new domain is proposed and verified by testing the accuracy of existing Hindi POS tagger for sports domain (here, Cricket) resulting in reduced average accuracy of 87.77% from approx. 94% overall tagger accuracy. Manual validation method is followed for evaluating the test result for generating correct error report for the sports domain data. Alongside, inter — annotator agreement/disagreement found among evaluators, and some major tagger based errors like unseen vocabulary and inconsistent performance has been recorded along with some suggestions for the improvement, serving as the basis of introducing adaptation for the Hindi tagger.","PeriodicalId":123050,"journal":{"name":"2017 3rd International Conference on Applied and Theoretical Computing and Communication Technology (iCATccT)","volume":"64 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 3rd International Conference on Applied and Theoretical Computing and Communication Technology (iCATccT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/icatcct.2017.8389124","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
In this paper, author reports the scope of domain adaptation for multi-domain Hindi POS tagger to a new domain, i.e. the popular domain of Cricket, through an initial experiment. Utility of Adaptation of new domain is proposed and verified by testing the accuracy of existing Hindi POS tagger for sports domain (here, Cricket) resulting in reduced average accuracy of 87.77% from approx. 94% overall tagger accuracy. Manual validation method is followed for evaluating the test result for generating correct error report for the sports domain data. Alongside, inter — annotator agreement/disagreement found among evaluators, and some major tagger based errors like unseen vocabulary and inconsistent performance has been recorded along with some suggestions for the improvement, serving as the basis of introducing adaptation for the Hindi tagger.