{"title":"使用 N-gram 模型预测俾路支语中的下一个单词","authors":"Sharan","doi":"10.30537/sjcms.v7i2.1273","DOIUrl":null,"url":null,"abstract":"Balochi Language is among the oldest languages, spoken by approximately 10 million people worldwide. The Balochi language has been spoken for a very long period. In comparison to other languages like English, Urdu, French etc. it has a research gap in Natural language processing (NLP). The next word prediction system is one of the techniques of NLP for suggesting standardization and corpus collection. This research aims to provide a next-word prediction system and a corpus with no ambiguity for the Balochi language. N-gram model for the next word prediction has been utilized, i.e. Unigram, Bigram, Trigram, Quad-gram, and so on. A trained model has been embedded in an application after being evaluated extrinsically and intrinsically. It plays a crucial role in typing through a keyboard and helps users to type faster. Additionally, it helps native users to have fewer typing errors in less time. The results of the research show that Five-gram model has the highest performance of 93% while Quad-gram model has 80% and Trigram model has 76% respectively.","PeriodicalId":32391,"journal":{"name":"Sukkur IBA Journal of Computing and Mathematical Sciences","volume":"44 7","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Prediction of Next Word in Balochi Language Using N-gram Model\",\"authors\":\"Sharan\",\"doi\":\"10.30537/sjcms.v7i2.1273\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Balochi Language is among the oldest languages, spoken by approximately 10 million people worldwide. The Balochi language has been spoken for a very long period. In comparison to other languages like English, Urdu, French etc. it has a research gap in Natural language processing (NLP). The next word prediction system is one of the techniques of NLP for suggesting standardization and corpus collection. This research aims to provide a next-word prediction system and a corpus with no ambiguity for the Balochi language. N-gram model for the next word prediction has been utilized, i.e. Unigram, Bigram, Trigram, Quad-gram, and so on. A trained model has been embedded in an application after being evaluated extrinsically and intrinsically. It plays a crucial role in typing through a keyboard and helps users to type faster. Additionally, it helps native users to have fewer typing errors in less time. The results of the research show that Five-gram model has the highest performance of 93% while Quad-gram model has 80% and Trigram model has 76% respectively.\",\"PeriodicalId\":32391,\"journal\":{\"name\":\"Sukkur IBA Journal of Computing and Mathematical Sciences\",\"volume\":\"44 7\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-05-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Sukkur IBA Journal of Computing and Mathematical Sciences\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.30537/sjcms.v7i2.1273\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Sukkur IBA Journal of Computing and Mathematical Sciences","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.30537/sjcms.v7i2.1273","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Prediction of Next Word in Balochi Language Using N-gram Model
Balochi Language is among the oldest languages, spoken by approximately 10 million people worldwide. The Balochi language has been spoken for a very long period. In comparison to other languages like English, Urdu, French etc. it has a research gap in Natural language processing (NLP). The next word prediction system is one of the techniques of NLP for suggesting standardization and corpus collection. This research aims to provide a next-word prediction system and a corpus with no ambiguity for the Balochi language. N-gram model for the next word prediction has been utilized, i.e. Unigram, Bigram, Trigram, Quad-gram, and so on. A trained model has been embedded in an application after being evaluated extrinsically and intrinsically. It plays a crucial role in typing through a keyboard and helps users to type faster. Additionally, it helps native users to have fewer typing errors in less time. The results of the research show that Five-gram model has the highest performance of 93% while Quad-gram model has 80% and Trigram model has 76% respectively.