{"title":"Saraiki Language Word Prediction And Spell Correction Framework","authors":"Muhammad Farjad Ali Raza, M. Naeem","doi":"10.1109/INMIC56986.2022.9972938","DOIUrl":null,"url":null,"abstract":"Word prediction, spelling error correction and finding similarity between words are very useful features in any language. The Saraiki is one of the popular languages spoken in Pakistan. To the best of our knowledge, very little work has been done in the literature for word prediction, spell correction and finding similar words for the Saraiki language. In this paper we address these issues by presenting a novel approach for word prediction, finding similar words, and spell correction in the Saraiki language. To achieve this, we used CBOW and Skip-Gram for the vectorization of the Saraiki language. From our results, we achieved word prediction accuracy of 24 % in case of word2vec while 29 % in case of the fastText. In case of word similarity, we achieved similarity score equal to 0.35, and 0.39 for word2vec CBOW and word2vec Skip-Gram respectively and similarity score of 0.35 and 0.41 for the fastText CBOW and the fastText Skip-Gram respectively. Our spell correction results show that as we increase wrong characters in words, the accuracy gets decreased. For sentence-level word prediction, we achieved accuracy of 63% in case of RoBERTa and 58% for distilled respectively.","PeriodicalId":404424,"journal":{"name":"2022 24th International Multitopic Conference (INMIC)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 24th International Multitopic Conference (INMIC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/INMIC56986.2022.9972938","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Word prediction, spelling error correction and finding similarity between words are very useful features in any language. The Saraiki is one of the popular languages spoken in Pakistan. To the best of our knowledge, very little work has been done in the literature for word prediction, spell correction and finding similar words for the Saraiki language. In this paper we address these issues by presenting a novel approach for word prediction, finding similar words, and spell correction in the Saraiki language. To achieve this, we used CBOW and Skip-Gram for the vectorization of the Saraiki language. From our results, we achieved word prediction accuracy of 24 % in case of word2vec while 29 % in case of the fastText. In case of word similarity, we achieved similarity score equal to 0.35, and 0.39 for word2vec CBOW and word2vec Skip-Gram respectively and similarity score of 0.35 and 0.41 for the fastText CBOW and the fastText Skip-Gram respectively. Our spell correction results show that as we increase wrong characters in words, the accuracy gets decreased. For sentence-level word prediction, we achieved accuracy of 63% in case of RoBERTa and 58% for distilled respectively.