Ei Phyu Phyu Mon, Ye Kyaw Thu, Than Than Yu, Aye Wai Oo
{"title":"SymSpell4Burmese:用于缅甸语拼写检查的对称删除拼写纠正算法(SymSpell)","authors":"Ei Phyu Phyu Mon, Ye Kyaw Thu, Than Than Yu, Aye Wai Oo","doi":"10.1109/iSAI-NLP54397.2021.9678171","DOIUrl":null,"url":null,"abstract":"Spell checker is a crucial language tool of natural language processing (NLP) and becomes important due to the increase of text-based communication at work, information retrieval, fraud detection, search engines, social media and research areas. In this paper, automatic spelling checking for Burmese is studied by applying Symmetric Delete Spelling Correction Algorithm (SymSpell). We experimented by using an open source SymSpell python library and applied our developing Burmese spelling training corpus together with four frequency dictionaries on ten error types. For the error detection phase, the N-gram language model is used to check our developing spelling training corpus against a dictionary. For the correction phrase, SymSpell is applied to propose candidate corrections within a specified maximum edit distance from the misspelled word. After generating candidates, the best correction in the given context is automatically chosen according to the highest frequency with a minimum edit distance. We investigated the performance of each error type and studied the importance of the dictionary depending on the average term length and maximum edit distance for Burmese spell checker based on SymSpell. Moreover, we observed that syllable level segmentation with a maximum edit distance of 3 gives faster and higher quality results compared with word level segmentation results.","PeriodicalId":339826,"journal":{"name":"2021 16th International Joint Symposium on Artificial Intelligence and Natural Language Processing (iSAI-NLP)","volume":"84 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"SymSpell4Burmese: Symmetric Delete Spelling Correction Algorithm (SymSpell) for Burmese Spelling Checking\",\"authors\":\"Ei Phyu Phyu Mon, Ye Kyaw Thu, Than Than Yu, Aye Wai Oo\",\"doi\":\"10.1109/iSAI-NLP54397.2021.9678171\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Spell checker is a crucial language tool of natural language processing (NLP) and becomes important due to the increase of text-based communication at work, information retrieval, fraud detection, search engines, social media and research areas. In this paper, automatic spelling checking for Burmese is studied by applying Symmetric Delete Spelling Correction Algorithm (SymSpell). We experimented by using an open source SymSpell python library and applied our developing Burmese spelling training corpus together with four frequency dictionaries on ten error types. For the error detection phase, the N-gram language model is used to check our developing spelling training corpus against a dictionary. For the correction phrase, SymSpell is applied to propose candidate corrections within a specified maximum edit distance from the misspelled word. After generating candidates, the best correction in the given context is automatically chosen according to the highest frequency with a minimum edit distance. We investigated the performance of each error type and studied the importance of the dictionary depending on the average term length and maximum edit distance for Burmese spell checker based on SymSpell. Moreover, we observed that syllable level segmentation with a maximum edit distance of 3 gives faster and higher quality results compared with word level segmentation results.\",\"PeriodicalId\":339826,\"journal\":{\"name\":\"2021 16th International Joint Symposium on Artificial Intelligence and Natural Language Processing (iSAI-NLP)\",\"volume\":\"84 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-12-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 16th International Joint Symposium on Artificial Intelligence and Natural Language Processing (iSAI-NLP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/iSAI-NLP54397.2021.9678171\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 16th International Joint Symposium on Artificial Intelligence and Natural Language Processing (iSAI-NLP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/iSAI-NLP54397.2021.9678171","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Spell checker is a crucial language tool of natural language processing (NLP) and becomes important due to the increase of text-based communication at work, information retrieval, fraud detection, search engines, social media and research areas. In this paper, automatic spelling checking for Burmese is studied by applying Symmetric Delete Spelling Correction Algorithm (SymSpell). We experimented by using an open source SymSpell python library and applied our developing Burmese spelling training corpus together with four frequency dictionaries on ten error types. For the error detection phase, the N-gram language model is used to check our developing spelling training corpus against a dictionary. For the correction phrase, SymSpell is applied to propose candidate corrections within a specified maximum edit distance from the misspelled word. After generating candidates, the best correction in the given context is automatically chosen according to the highest frequency with a minimum edit distance. We investigated the performance of each error type and studied the importance of the dictionary depending on the average term length and maximum edit distance for Burmese spell checker based on SymSpell. Moreover, we observed that syllable level segmentation with a maximum edit distance of 3 gives faster and higher quality results compared with word level segmentation results.