孟加拉语语义标注语料库的开发与应用

Monisha Biswas, M. M. Hoque
{"title":"孟加拉语语义标注语料库的开发与应用","authors":"Monisha Biswas, M. M. Hoque","doi":"10.1109/ICBSLP47725.2019.201516","DOIUrl":null,"url":null,"abstract":"Sense annotated corpus can be treated as an essential resource for lexicon development, morphological processing and also for evaluating the performance of a word sense disambiguation (WSD) system. In this paper, a Bangla sense annotated corpus is generated from a raw collection of Bangla text, where only the sentences which contain at least one Bangla ambiguous word are retrieved from the raw corpus. All individual word forms of the sentences stored in our Bangla sense annotated corpus are tagged with their corresponding root word forms and POS types and the detected ambiguous words in the sentences are also tagged with their actual senses. The developed Bangla sense annotated corpus initially contains 5028 Bangla sentences with proper annotation and the overall performance of our Bangla sense annotated corpus creation system is 86.95%. Index Terms – Bangla language processing, Sense annotated corpus, Lexicon, Word sense disambiguation, Ambiguous word.","PeriodicalId":413077,"journal":{"name":"2019 International Conference on Bangla Speech and Language Processing (ICBSLP)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Development of a Bangla Sense Annotated Corpus for Word Sense Disambiguation\",\"authors\":\"Monisha Biswas, M. M. Hoque\",\"doi\":\"10.1109/ICBSLP47725.2019.201516\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Sense annotated corpus can be treated as an essential resource for lexicon development, morphological processing and also for evaluating the performance of a word sense disambiguation (WSD) system. In this paper, a Bangla sense annotated corpus is generated from a raw collection of Bangla text, where only the sentences which contain at least one Bangla ambiguous word are retrieved from the raw corpus. All individual word forms of the sentences stored in our Bangla sense annotated corpus are tagged with their corresponding root word forms and POS types and the detected ambiguous words in the sentences are also tagged with their actual senses. The developed Bangla sense annotated corpus initially contains 5028 Bangla sentences with proper annotation and the overall performance of our Bangla sense annotated corpus creation system is 86.95%. Index Terms – Bangla language processing, Sense annotated corpus, Lexicon, Word sense disambiguation, Ambiguous word.\",\"PeriodicalId\":413077,\"journal\":{\"name\":\"2019 International Conference on Bangla Speech and Language Processing (ICBSLP)\",\"volume\":\"6 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 International Conference on Bangla Speech and Language Processing (ICBSLP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICBSLP47725.2019.201516\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 International Conference on Bangla Speech and Language Processing (ICBSLP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICBSLP47725.2019.201516","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

摘要

语义标注语料库可作为词汇开发、词形处理和评价词义消歧系统性能的重要资源。在本文中,从原始的孟加拉语文本集合中生成一个孟加拉语语义注释语料库,其中只有包含至少一个孟加拉语歧义词的句子才会从原始语料库中检索到。我们的孟加拉语语义标注语料库中存储的所有句子的单个词形都标注了其相应的词根形式和词性类型,并且对句子中检测到的歧义词也标注了其实际意义。开发的孟加拉语语义标注语料库最初包含5028个孟加拉语句子,并且标注正确,我们的孟加拉语语义标注语料库创建系统的总体性能为86.95%。索引术语-孟加拉语处理,语义注释语料库,词典,词义消歧,歧义词。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Development of a Bangla Sense Annotated Corpus for Word Sense Disambiguation
Sense annotated corpus can be treated as an essential resource for lexicon development, morphological processing and also for evaluating the performance of a word sense disambiguation (WSD) system. In this paper, a Bangla sense annotated corpus is generated from a raw collection of Bangla text, where only the sentences which contain at least one Bangla ambiguous word are retrieved from the raw corpus. All individual word forms of the sentences stored in our Bangla sense annotated corpus are tagged with their corresponding root word forms and POS types and the detected ambiguous words in the sentences are also tagged with their actual senses. The developed Bangla sense annotated corpus initially contains 5028 Bangla sentences with proper annotation and the overall performance of our Bangla sense annotated corpus creation system is 86.95%. Index Terms – Bangla language processing, Sense annotated corpus, Lexicon, Word sense disambiguation, Ambiguous word.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信