对代表性不足的口语的自动拼写检查和纠正:Wolof的案例研究

IF 3.4 2区 工程技术 Q2 TRANSPORTATION SCIENCE & TECHNOLOGY
Thierno Ibrahima Ciss'e, F. Sadat
{"title":"对代表性不足的口语的自动拼写检查和纠正:Wolof的案例研究","authors":"Thierno Ibrahima Ciss'e, F. Sadat","doi":"10.48550/arXiv.2305.12694","DOIUrl":null,"url":null,"abstract":"This paper presents a spell checker and correction tool specifically designed for Wolof, an under-represented spoken language in Africa. The proposed spell checker leverages a combination of a trie data structure, dynamic programming, and the weighted Levenshtein distance to generate suggestions for misspelled words. We created novel linguistic resources for Wolof, such as a lexicon and a corpus of misspelled words, using a semi-automatic approach that combines manual and automatic annotation methods. Despite the limited data available for the Wolof language, the spell checker’s performance showed a predictive accuracy of 98.31% and a suggestion accuracy of 93.33%.Our primary focus remains the revitalization and preservation of Wolof as an Indigenous and spoken language in Africa, providing our efforts to develop novel linguistic resources. This work represents a valuable contribution to the growth of computational tools and resources for the Wolof language and provides a strong foundation for future studies in the automatic spell checking and correction field.","PeriodicalId":48510,"journal":{"name":"International Journal of Rail Transportation","volume":"102 1","pages":"1-10"},"PeriodicalIF":3.4000,"publicationDate":"2023-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Automatic Spell Checker and Correction for Under-represented Spoken Languages: Case Study on Wolof\",\"authors\":\"Thierno Ibrahima Ciss'e, F. Sadat\",\"doi\":\"10.48550/arXiv.2305.12694\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper presents a spell checker and correction tool specifically designed for Wolof, an under-represented spoken language in Africa. The proposed spell checker leverages a combination of a trie data structure, dynamic programming, and the weighted Levenshtein distance to generate suggestions for misspelled words. We created novel linguistic resources for Wolof, such as a lexicon and a corpus of misspelled words, using a semi-automatic approach that combines manual and automatic annotation methods. Despite the limited data available for the Wolof language, the spell checker’s performance showed a predictive accuracy of 98.31% and a suggestion accuracy of 93.33%.Our primary focus remains the revitalization and preservation of Wolof as an Indigenous and spoken language in Africa, providing our efforts to develop novel linguistic resources. This work represents a valuable contribution to the growth of computational tools and resources for the Wolof language and provides a strong foundation for future studies in the automatic spell checking and correction field.\",\"PeriodicalId\":48510,\"journal\":{\"name\":\"International Journal of Rail Transportation\",\"volume\":\"102 1\",\"pages\":\"1-10\"},\"PeriodicalIF\":3.4000,\"publicationDate\":\"2023-05-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Rail Transportation\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://doi.org/10.48550/arXiv.2305.12694\",\"RegionNum\":2,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"TRANSPORTATION SCIENCE & TECHNOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Rail Transportation","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.48550/arXiv.2305.12694","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"TRANSPORTATION SCIENCE & TECHNOLOGY","Score":null,"Total":0}
引用次数: 0

摘要

本文提出了一个拼写检查和纠正工具,专门为Wolof,在非洲代表性不足的口语设计。所建议的拼写检查器结合了trie数据结构、动态规划和加权Levenshtein距离来生成拼写错误单词的建议。我们为Wolof创建了新的语言资源,例如词典和拼错单词的语料库,使用了结合了手动和自动注释方法的半自动方法。尽管Wolof语言的可用数据有限,但拼写检查器的预测准确率为98.31%,建议准确率为93.33%。我们的主要重点仍然是振兴和保存沃洛夫语作为非洲的土著语言和口语,为开发新的语言资源提供我们的努力。这项工作为Wolof语言的计算工具和资源的发展做出了宝贵的贡献,并为自动拼写检查和纠正领域的未来研究提供了坚实的基础。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Automatic Spell Checker and Correction for Under-represented Spoken Languages: Case Study on Wolof
This paper presents a spell checker and correction tool specifically designed for Wolof, an under-represented spoken language in Africa. The proposed spell checker leverages a combination of a trie data structure, dynamic programming, and the weighted Levenshtein distance to generate suggestions for misspelled words. We created novel linguistic resources for Wolof, such as a lexicon and a corpus of misspelled words, using a semi-automatic approach that combines manual and automatic annotation methods. Despite the limited data available for the Wolof language, the spell checker’s performance showed a predictive accuracy of 98.31% and a suggestion accuracy of 93.33%.Our primary focus remains the revitalization and preservation of Wolof as an Indigenous and spoken language in Africa, providing our efforts to develop novel linguistic resources. This work represents a valuable contribution to the growth of computational tools and resources for the Wolof language and provides a strong foundation for future studies in the automatic spell checking and correction field.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
International Journal of Rail Transportation
International Journal of Rail Transportation TRANSPORTATION SCIENCE & TECHNOLOGY-
CiteScore
6.90
自引率
15.00%
发文量
51
期刊介绍: The unprecedented modernization and expansion of rail transportation system will require substantial new efforts in scientific research for field-deployable technologies. The International Journal of Rail Transportation (IJRT) aims to provide an open forum for scientists, researchers, and engineers in the world to promote the exchange of the latest scientific and technological innovations in rail transportation; and to advance the state-of-the-art engineering and practices for various types of rail based transportation systems. IJRT covers all main areas of rail vehicle, infrastructure, traction power, operation, communication, and environment. The journal publishes original, significant articles on topics in dynamics and mechanics of rail vehicle, track, and bridge system; planning and design, construction, operation, inspection, and maintenance of rail infrastructure; train operation, control, scheduling and management; rail electrification; signalling and communication; and environmental impacts such as vibration and noise. The editorial policy of the new journal will abide by the highest level of standards in research rigor, ethics, and academic freedom. All published articles in IJRT have undergone rigorous peer review, based on initial editor screening and anonymous refereeing by independent experts. There are no page charges and colour figures are included in the online edition free of charge.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信