萨拉基语单词预测和拼写纠正框架

Muhammad Farjad Ali Raza, M. Naeem
{"title":"萨拉基语单词预测和拼写纠正框架","authors":"Muhammad Farjad Ali Raza, M. Naeem","doi":"10.1109/INMIC56986.2022.9972938","DOIUrl":null,"url":null,"abstract":"Word prediction, spelling error correction and finding similarity between words are very useful features in any language. The Saraiki is one of the popular languages spoken in Pakistan. To the best of our knowledge, very little work has been done in the literature for word prediction, spell correction and finding similar words for the Saraiki language. In this paper we address these issues by presenting a novel approach for word prediction, finding similar words, and spell correction in the Saraiki language. To achieve this, we used CBOW and Skip-Gram for the vectorization of the Saraiki language. From our results, we achieved word prediction accuracy of 24 % in case of word2vec while 29 % in case of the fastText. In case of word similarity, we achieved similarity score equal to 0.35, and 0.39 for word2vec CBOW and word2vec Skip-Gram respectively and similarity score of 0.35 and 0.41 for the fastText CBOW and the fastText Skip-Gram respectively. Our spell correction results show that as we increase wrong characters in words, the accuracy gets decreased. For sentence-level word prediction, we achieved accuracy of 63% in case of RoBERTa and 58% for distilled respectively.","PeriodicalId":404424,"journal":{"name":"2022 24th International Multitopic Conference (INMIC)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Saraiki Language Word Prediction And Spell Correction Framework\",\"authors\":\"Muhammad Farjad Ali Raza, M. Naeem\",\"doi\":\"10.1109/INMIC56986.2022.9972938\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Word prediction, spelling error correction and finding similarity between words are very useful features in any language. The Saraiki is one of the popular languages spoken in Pakistan. To the best of our knowledge, very little work has been done in the literature for word prediction, spell correction and finding similar words for the Saraiki language. In this paper we address these issues by presenting a novel approach for word prediction, finding similar words, and spell correction in the Saraiki language. To achieve this, we used CBOW and Skip-Gram for the vectorization of the Saraiki language. From our results, we achieved word prediction accuracy of 24 % in case of word2vec while 29 % in case of the fastText. In case of word similarity, we achieved similarity score equal to 0.35, and 0.39 for word2vec CBOW and word2vec Skip-Gram respectively and similarity score of 0.35 and 0.41 for the fastText CBOW and the fastText Skip-Gram respectively. Our spell correction results show that as we increase wrong characters in words, the accuracy gets decreased. For sentence-level word prediction, we achieved accuracy of 63% in case of RoBERTa and 58% for distilled respectively.\",\"PeriodicalId\":404424,\"journal\":{\"name\":\"2022 24th International Multitopic Conference (INMIC)\",\"volume\":\"17 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-10-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 24th International Multitopic Conference (INMIC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/INMIC56986.2022.9972938\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 24th International Multitopic Conference (INMIC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/INMIC56986.2022.9972938","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

单词预测、拼写错误纠正和查找单词之间的相似性在任何语言中都是非常有用的功能。萨拉基语是巴基斯坦流行的语言之一。据我们所知,文献中很少有关于萨拉基语的单词预测、拼写纠正和寻找相似单词的工作。在本文中,我们通过提出一种新的方法来解决这些问题,该方法用于萨拉基语的单词预测、查找相似单词和拼写纠正。为了实现这一点,我们使用了CBOW和Skip-Gram对Saraiki语言进行矢量化。从我们的结果来看,我们在word2vec的情况下实现了24%的单词预测准确率,而在fastText的情况下实现了29%的准确率。单词相似度方面,word2vec CBOW和word2vec Skip-Gram的相似度得分分别为0.35和0.39,fastText CBOW和fastText Skip-Gram的相似度得分分别为0.35和0.41。我们的拼写校正结果表明,当我们增加单词中的错误字符时,正确率会降低。对于句子级单词预测,RoBERTa和distilled分别达到了63%和58%的准确率。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Saraiki Language Word Prediction And Spell Correction Framework
Word prediction, spelling error correction and finding similarity between words are very useful features in any language. The Saraiki is one of the popular languages spoken in Pakistan. To the best of our knowledge, very little work has been done in the literature for word prediction, spell correction and finding similar words for the Saraiki language. In this paper we address these issues by presenting a novel approach for word prediction, finding similar words, and spell correction in the Saraiki language. To achieve this, we used CBOW and Skip-Gram for the vectorization of the Saraiki language. From our results, we achieved word prediction accuracy of 24 % in case of word2vec while 29 % in case of the fastText. In case of word similarity, we achieved similarity score equal to 0.35, and 0.39 for word2vec CBOW and word2vec Skip-Gram respectively and similarity score of 0.35 and 0.41 for the fastText CBOW and the fastText Skip-Gram respectively. Our spell correction results show that as we increase wrong characters in words, the accuracy gets decreased. For sentence-level word prediction, we achieved accuracy of 63% in case of RoBERTa and 58% for distilled respectively.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信