{"title":"A Sinhala and Tamil Extension to Generic Environment for Context-aware Correction","authors":"Lakshikka Sithamparanathan, T. Uthayasanker","doi":"10.1109/NITC48475.2019.9114399","DOIUrl":null,"url":null,"abstract":"There are several types of research available on spell checkers for European languages and Indian languages. However, low resourced languages like Tamil & Sinhala have limited research in this problem space, maybe, because of its highly inflectional and morphologically rich nature. There is no fully functional context-aware spell-checking system, especially as an open source. A Generic Environment for context-aware spell correction approach is extended for resource-scarce languages: Sinhala and Tamil in this paper. Experimental results show that our system detects the error in spelling well and provides the most suitable suggestions for correcting the misspelled words with a minimum of 85% accuracy for Tamil and 70% for the Sinhala Language. This is the first ever context-aware spell corrector for the Sinhala language. Compared to prior Tamil language context-aware spell correctors this leaps in 1) modularized architecture and 2) increased coverage and accuracy. Moreover, this study produced a Tamil and Sinhala spell correction benchmark dataset. Both the dataset and the tools are available for public use.","PeriodicalId":386923,"journal":{"name":"2019 National Information Technology Conference (NITC)","volume":"36 4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 National Information Technology Conference (NITC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/NITC48475.2019.9114399","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
There are several types of research available on spell checkers for European languages and Indian languages. However, low resourced languages like Tamil & Sinhala have limited research in this problem space, maybe, because of its highly inflectional and morphologically rich nature. There is no fully functional context-aware spell-checking system, especially as an open source. A Generic Environment for context-aware spell correction approach is extended for resource-scarce languages: Sinhala and Tamil in this paper. Experimental results show that our system detects the error in spelling well and provides the most suitable suggestions for correcting the misspelled words with a minimum of 85% accuracy for Tamil and 70% for the Sinhala Language. This is the first ever context-aware spell corrector for the Sinhala language. Compared to prior Tamil language context-aware spell correctors this leaps in 1) modularized architecture and 2) increased coverage and accuracy. Moreover, this study produced a Tamil and Sinhala spell correction benchmark dataset. Both the dataset and the tools are available for public use.