{"title":"孟加拉语语义标注语料库的开发与应用","authors":"Monisha Biswas, M. M. Hoque","doi":"10.1109/ICBSLP47725.2019.201516","DOIUrl":null,"url":null,"abstract":"Sense annotated corpus can be treated as an essential resource for lexicon development, morphological processing and also for evaluating the performance of a word sense disambiguation (WSD) system. In this paper, a Bangla sense annotated corpus is generated from a raw collection of Bangla text, where only the sentences which contain at least one Bangla ambiguous word are retrieved from the raw corpus. All individual word forms of the sentences stored in our Bangla sense annotated corpus are tagged with their corresponding root word forms and POS types and the detected ambiguous words in the sentences are also tagged with their actual senses. The developed Bangla sense annotated corpus initially contains 5028 Bangla sentences with proper annotation and the overall performance of our Bangla sense annotated corpus creation system is 86.95%. Index Terms – Bangla language processing, Sense annotated corpus, Lexicon, Word sense disambiguation, Ambiguous word.","PeriodicalId":413077,"journal":{"name":"2019 International Conference on Bangla Speech and Language Processing (ICBSLP)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Development of a Bangla Sense Annotated Corpus for Word Sense Disambiguation\",\"authors\":\"Monisha Biswas, M. M. Hoque\",\"doi\":\"10.1109/ICBSLP47725.2019.201516\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Sense annotated corpus can be treated as an essential resource for lexicon development, morphological processing and also for evaluating the performance of a word sense disambiguation (WSD) system. In this paper, a Bangla sense annotated corpus is generated from a raw collection of Bangla text, where only the sentences which contain at least one Bangla ambiguous word are retrieved from the raw corpus. All individual word forms of the sentences stored in our Bangla sense annotated corpus are tagged with their corresponding root word forms and POS types and the detected ambiguous words in the sentences are also tagged with their actual senses. The developed Bangla sense annotated corpus initially contains 5028 Bangla sentences with proper annotation and the overall performance of our Bangla sense annotated corpus creation system is 86.95%. Index Terms – Bangla language processing, Sense annotated corpus, Lexicon, Word sense disambiguation, Ambiguous word.\",\"PeriodicalId\":413077,\"journal\":{\"name\":\"2019 International Conference on Bangla Speech and Language Processing (ICBSLP)\",\"volume\":\"6 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 International Conference on Bangla Speech and Language Processing (ICBSLP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICBSLP47725.2019.201516\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 International Conference on Bangla Speech and Language Processing (ICBSLP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICBSLP47725.2019.201516","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Development of a Bangla Sense Annotated Corpus for Word Sense Disambiguation
Sense annotated corpus can be treated as an essential resource for lexicon development, morphological processing and also for evaluating the performance of a word sense disambiguation (WSD) system. In this paper, a Bangla sense annotated corpus is generated from a raw collection of Bangla text, where only the sentences which contain at least one Bangla ambiguous word are retrieved from the raw corpus. All individual word forms of the sentences stored in our Bangla sense annotated corpus are tagged with their corresponding root word forms and POS types and the detected ambiguous words in the sentences are also tagged with their actual senses. The developed Bangla sense annotated corpus initially contains 5028 Bangla sentences with proper annotation and the overall performance of our Bangla sense annotated corpus creation system is 86.95%. Index Terms – Bangla language processing, Sense annotated corpus, Lexicon, Word sense disambiguation, Ambiguous word.