A Sinhala and Tamil Extension to Generic Environment for Context-aware Correction

Lakshikka Sithamparanathan, T. Uthayasanker
{"title":"A Sinhala and Tamil Extension to Generic Environment for Context-aware Correction","authors":"Lakshikka Sithamparanathan, T. Uthayasanker","doi":"10.1109/NITC48475.2019.9114399","DOIUrl":null,"url":null,"abstract":"There are several types of research available on spell checkers for European languages and Indian languages. However, low resourced languages like Tamil & Sinhala have limited research in this problem space, maybe, because of its highly inflectional and morphologically rich nature. There is no fully functional context-aware spell-checking system, especially as an open source. A Generic Environment for context-aware spell correction approach is extended for resource-scarce languages: Sinhala and Tamil in this paper. Experimental results show that our system detects the error in spelling well and provides the most suitable suggestions for correcting the misspelled words with a minimum of 85% accuracy for Tamil and 70% for the Sinhala Language. This is the first ever context-aware spell corrector for the Sinhala language. Compared to prior Tamil language context-aware spell correctors this leaps in 1) modularized architecture and 2) increased coverage and accuracy. Moreover, this study produced a Tamil and Sinhala spell correction benchmark dataset. Both the dataset and the tools are available for public use.","PeriodicalId":386923,"journal":{"name":"2019 National Information Technology Conference (NITC)","volume":"36 4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 National Information Technology Conference (NITC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/NITC48475.2019.9114399","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

There are several types of research available on spell checkers for European languages and Indian languages. However, low resourced languages like Tamil & Sinhala have limited research in this problem space, maybe, because of its highly inflectional and morphologically rich nature. There is no fully functional context-aware spell-checking system, especially as an open source. A Generic Environment for context-aware spell correction approach is extended for resource-scarce languages: Sinhala and Tamil in this paper. Experimental results show that our system detects the error in spelling well and provides the most suitable suggestions for correcting the misspelled words with a minimum of 85% accuracy for Tamil and 70% for the Sinhala Language. This is the first ever context-aware spell corrector for the Sinhala language. Compared to prior Tamil language context-aware spell correctors this leaps in 1) modularized architecture and 2) increased coverage and accuracy. Moreover, this study produced a Tamil and Sinhala spell correction benchmark dataset. Both the dataset and the tools are available for public use.
僧伽罗语和泰米尔语扩展到上下文感知校正的通用环境
有几种类型的研究可用于欧洲语言和印度语言的拼写检查器。然而,像泰米尔语和僧伽罗语这样资源贫乏的语言在这个问题领域的研究有限,也许是因为其高度屈折和形态丰富的性质。没有功能完整的上下文感知拼写检查系统,特别是作为开源系统。本文为资源稀缺语言:僧伽罗语和泰米尔语扩展了一个上下文感知拼写校正方法的通用环境。实验结果表明,该系统能够很好地检测拼写错误,并提供最合适的拼写错误纠正建议,泰米尔语和僧伽罗语的拼写错误纠正准确率分别达到85%和70%。这是有史以来第一个上下文感知的僧伽罗语拼写校正器。与之前的泰米尔语上下文感知拼写校正器相比,它在1)模块化架构和2)增加覆盖范围和准确性方面实现了飞跃。此外,本研究还生成了泰米尔语和僧伽罗语拼写校正基准数据集。数据集和工具都可供公众使用。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信