孟加拉语句子缺词检测与纠错方法研究

M. Mridha, Md. Mashod Rana, Md. Abdul Hamid, Md. Eyaseen Arafat Khan, Md. Masud Ahmed, Mohammad Tipu Sultan
{"title":"孟加拉语句子缺词检测与纠错方法研究","authors":"M. Mridha, Md. Mashod Rana, Md. Abdul Hamid, Md. Eyaseen Arafat Khan, Md. Masud Ahmed, Mohammad Tipu Sultan","doi":"10.1109/ECACE.2019.8679416","DOIUrl":null,"url":null,"abstract":"Auto-correction for missing word in a sentence is not so easy. Also, it is found more challenging for the Bengali language. Our rigorous study reveals the fact that no significant research works have been done for the Bengali Language on this very topic. In this paper, we proposed a method that can detect the missing word and provide a suggestion list correspond to missed word with 82.82% accuracy. We have used n-gram model to find whether a word is missing between two words from a sentence or not. Then, we have used probability scoring to rank the suggestion list after finding the probable words for the missed word. We have used a corpus for making the decision which is the collection of bigram and another corpus is used for preferable word for missed word which is a collection of the trigram. Finally, we have used another six corpora to evaluate our proposed method. All corpora are created by us using the data collected from the web.","PeriodicalId":226060,"journal":{"name":"2019 International Conference on Electrical, Computer and Communication Engineering (ECCE)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"An Approach for Detection and Correction of Missing Word in Bengali Sentence\",\"authors\":\"M. Mridha, Md. Mashod Rana, Md. Abdul Hamid, Md. Eyaseen Arafat Khan, Md. Masud Ahmed, Mohammad Tipu Sultan\",\"doi\":\"10.1109/ECACE.2019.8679416\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Auto-correction for missing word in a sentence is not so easy. Also, it is found more challenging for the Bengali language. Our rigorous study reveals the fact that no significant research works have been done for the Bengali Language on this very topic. In this paper, we proposed a method that can detect the missing word and provide a suggestion list correspond to missed word with 82.82% accuracy. We have used n-gram model to find whether a word is missing between two words from a sentence or not. Then, we have used probability scoring to rank the suggestion list after finding the probable words for the missed word. We have used a corpus for making the decision which is the collection of bigram and another corpus is used for preferable word for missed word which is a collection of the trigram. Finally, we have used another six corpora to evaluate our proposed method. All corpora are created by us using the data collected from the web.\",\"PeriodicalId\":226060,\"journal\":{\"name\":\"2019 International Conference on Electrical, Computer and Communication Engineering (ECCE)\",\"volume\":\"7 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-02-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 International Conference on Electrical, Computer and Communication Engineering (ECCE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ECACE.2019.8679416\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 International Conference on Electrical, Computer and Communication Engineering (ECCE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ECACE.2019.8679416","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

摘要

句子中缺词的自动纠错不是那么容易的。此外,人们发现孟加拉语更具挑战性。我们严谨的研究揭示了这样一个事实,即在这个主题上没有为孟加拉语做过重要的研究工作。在本文中,我们提出了一种检测缺失词并提供缺失词对应的建议列表的方法,准确率为82.82%。我们使用n-gram模型来寻找句子中两个单词之间是否缺少一个单词。然后,我们在找到可能与遗漏的单词对应的单词后,使用概率评分对建议列表进行排序。我们用一个语料库来做决定,它是双字母的集合,另一个语料库用于优选词,它是三字母的集合。最后,我们使用另外六个语料库来评估我们提出的方法。所有的语料库都是由我们使用从网上收集的数据创建的。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
An Approach for Detection and Correction of Missing Word in Bengali Sentence
Auto-correction for missing word in a sentence is not so easy. Also, it is found more challenging for the Bengali language. Our rigorous study reveals the fact that no significant research works have been done for the Bengali Language on this very topic. In this paper, we proposed a method that can detect the missing word and provide a suggestion list correspond to missed word with 82.82% accuracy. We have used n-gram model to find whether a word is missing between two words from a sentence or not. Then, we have used probability scoring to rank the suggestion list after finding the probable words for the missed word. We have used a corpus for making the decision which is the collection of bigram and another corpus is used for preferable word for missed word which is a collection of the trigram. Finally, we have used another six corpora to evaluate our proposed method. All corpora are created by us using the data collected from the web.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信