Anupam Pandey, Srishti Singh, Atul Kr. Ojha, Girish Nath Jha
{"title":"印地语POS标注器在标注和领域适应方面的挑战:以板球为例","authors":"Anupam Pandey, Srishti Singh, Atul Kr. Ojha, Girish Nath Jha","doi":"10.1109/icatcct.2017.8389124","DOIUrl":null,"url":null,"abstract":"In this paper, author reports the scope of domain adaptation for multi-domain Hindi POS tagger to a new domain, i.e. the popular domain of Cricket, through an initial experiment. Utility of Adaptation of new domain is proposed and verified by testing the accuracy of existing Hindi POS tagger for sports domain (here, Cricket) resulting in reduced average accuracy of 87.77% from approx. 94% overall tagger accuracy. Manual validation method is followed for evaluating the test result for generating correct error report for the sports domain data. Alongside, inter — annotator agreement/disagreement found among evaluators, and some major tagger based errors like unseen vocabulary and inconsistent performance has been recorded along with some suggestions for the improvement, serving as the basis of introducing adaptation for the Hindi tagger.","PeriodicalId":123050,"journal":{"name":"2017 3rd International Conference on Applied and Theoretical Computing and Communication Technology (iCATccT)","volume":"64 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Challenges in annotation and domain adaptation in hindi POS tagger: With reference to cricket\",\"authors\":\"Anupam Pandey, Srishti Singh, Atul Kr. Ojha, Girish Nath Jha\",\"doi\":\"10.1109/icatcct.2017.8389124\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, author reports the scope of domain adaptation for multi-domain Hindi POS tagger to a new domain, i.e. the popular domain of Cricket, through an initial experiment. Utility of Adaptation of new domain is proposed and verified by testing the accuracy of existing Hindi POS tagger for sports domain (here, Cricket) resulting in reduced average accuracy of 87.77% from approx. 94% overall tagger accuracy. Manual validation method is followed for evaluating the test result for generating correct error report for the sports domain data. Alongside, inter — annotator agreement/disagreement found among evaluators, and some major tagger based errors like unseen vocabulary and inconsistent performance has been recorded along with some suggestions for the improvement, serving as the basis of introducing adaptation for the Hindi tagger.\",\"PeriodicalId\":123050,\"journal\":{\"name\":\"2017 3rd International Conference on Applied and Theoretical Computing and Communication Technology (iCATccT)\",\"volume\":\"64 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 3rd International Conference on Applied and Theoretical Computing and Communication Technology (iCATccT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/icatcct.2017.8389124\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 3rd International Conference on Applied and Theoretical Computing and Communication Technology (iCATccT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/icatcct.2017.8389124","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Challenges in annotation and domain adaptation in hindi POS tagger: With reference to cricket
In this paper, author reports the scope of domain adaptation for multi-domain Hindi POS tagger to a new domain, i.e. the popular domain of Cricket, through an initial experiment. Utility of Adaptation of new domain is proposed and verified by testing the accuracy of existing Hindi POS tagger for sports domain (here, Cricket) resulting in reduced average accuracy of 87.77% from approx. 94% overall tagger accuracy. Manual validation method is followed for evaluating the test result for generating correct error report for the sports domain data. Alongside, inter — annotator agreement/disagreement found among evaluators, and some major tagger based errors like unseen vocabulary and inconsistent performance has been recorded along with some suggestions for the improvement, serving as the basis of introducing adaptation for the Hindi tagger.