输入Yorùbá文本文档中的拼写错误模式

Asahiah Franklin Oladiipo, Onifade Mary Taiwo, Adegunlehin Abayomi Emmanuel
{"title":"输入Yorùbá文本文档中的拼写错误模式","authors":"Asahiah Franklin Oladiipo, Onifade Mary Taiwo, Adegunlehin Abayomi Emmanuel","doi":"10.5815/ijieeb.2020.06.03","DOIUrl":null,"url":null,"abstract":"While writing in most of the world’s major languages have a long history, Yorùbá is a relatively young language as far as writing it down is concerned. It is therefore an under-resourced language as far as tools for processing it in digital format is concerned. Spell checking is one of these tools. An analysis of the spelling error pattern is fundamental to the task of producing a good spell checker. We addressed this challenge in this article and our findings showed that spelling error pattern in Yorùbá followed that of other languages in general. There were, however, obvious departure from the norms in the specific. Diacritic-related misspelling accounted for more than 80% of all errors and words with single edit error were less than the generally expected minimum threshold of 80%. In addition, most of the errors were vowel-related with consonants accounting for less than 15% of all errors. Word-length does not seem to have any direct bearing on number of errors in a word. The research showed that the impact of diacritics on spelling error is more in Yorùbá where diacritics are majorly used for tone marking where it accounts for more than 80% of spelling errors than in languages like Brazilian Portuguese and Spanish where diacritics are used for differentiating characters where spelling error due to diacritics covered less than 60% of all errors. We thus conclude that while, to a significant extent, the character set used in a language determines distribution of spelling error, the purpose to which diacritics is employed in language also affect the distribution of spelling error in a language.","PeriodicalId":427770,"journal":{"name":"International Journal of Information Engineering and Electronic Business","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Spelling Error Patterns in Typed Yorùbá Text Documents\",\"authors\":\"Asahiah Franklin Oladiipo, Onifade Mary Taiwo, Adegunlehin Abayomi Emmanuel\",\"doi\":\"10.5815/ijieeb.2020.06.03\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"While writing in most of the world’s major languages have a long history, Yorùbá is a relatively young language as far as writing it down is concerned. It is therefore an under-resourced language as far as tools for processing it in digital format is concerned. Spell checking is one of these tools. An analysis of the spelling error pattern is fundamental to the task of producing a good spell checker. We addressed this challenge in this article and our findings showed that spelling error pattern in Yorùbá followed that of other languages in general. There were, however, obvious departure from the norms in the specific. Diacritic-related misspelling accounted for more than 80% of all errors and words with single edit error were less than the generally expected minimum threshold of 80%. In addition, most of the errors were vowel-related with consonants accounting for less than 15% of all errors. Word-length does not seem to have any direct bearing on number of errors in a word. The research showed that the impact of diacritics on spelling error is more in Yorùbá where diacritics are majorly used for tone marking where it accounts for more than 80% of spelling errors than in languages like Brazilian Portuguese and Spanish where diacritics are used for differentiating characters where spelling error due to diacritics covered less than 60% of all errors. We thus conclude that while, to a significant extent, the character set used in a language determines distribution of spelling error, the purpose to which diacritics is employed in language also affect the distribution of spelling error in a language.\",\"PeriodicalId\":427770,\"journal\":{\"name\":\"International Journal of Information Engineering and Electronic Business\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-12-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Information Engineering and Electronic Business\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.5815/ijieeb.2020.06.03\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Information Engineering and Electronic Business","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5815/ijieeb.2020.06.03","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

虽然世界上大多数主要语言的书写都有很长的历史,但就书写而言,Yorùbá是一门相对年轻的语言。因此,就以数字格式处理它的工具而言,它是一种资源不足的语言。拼写检查就是其中一种工具。对拼写错误模式的分析是生成一个好的拼写检查器的基础。我们在本文中解决了这个问题,我们的发现表明,Yorùbá中的拼写错误模式与其他语言的拼写错误模式大致相同。然而,在具体方面却明显偏离了规范。与变音符相关的拼写错误占所有错误的80%以上,单个编辑错误的单词低于通常预期的80%的最低阈值。此外,大多数错误都与元音和辅音有关,占所有错误的不到15%。单词长度似乎与单词中的错误数量没有任何直接关系。研究表明,变音符号对拼写错误的影响在Yorùbá中更大,变音符号主要用于音调标记,占拼写错误的80%以上,而在巴西葡萄牙语和西班牙语等语言中,变音符号用于区分字符,由变音符号引起的拼写错误占所有错误的不到60%。因此,我们得出结论,虽然在很大程度上,语言中使用的字符集决定了拼写错误的分布,但语言中使用变音符号的目的也会影响语言中拼写错误的分布。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Spelling Error Patterns in Typed Yorùbá Text Documents
While writing in most of the world’s major languages have a long history, Yorùbá is a relatively young language as far as writing it down is concerned. It is therefore an under-resourced language as far as tools for processing it in digital format is concerned. Spell checking is one of these tools. An analysis of the spelling error pattern is fundamental to the task of producing a good spell checker. We addressed this challenge in this article and our findings showed that spelling error pattern in Yorùbá followed that of other languages in general. There were, however, obvious departure from the norms in the specific. Diacritic-related misspelling accounted for more than 80% of all errors and words with single edit error were less than the generally expected minimum threshold of 80%. In addition, most of the errors were vowel-related with consonants accounting for less than 15% of all errors. Word-length does not seem to have any direct bearing on number of errors in a word. The research showed that the impact of diacritics on spelling error is more in Yorùbá where diacritics are majorly used for tone marking where it accounts for more than 80% of spelling errors than in languages like Brazilian Portuguese and Spanish where diacritics are used for differentiating characters where spelling error due to diacritics covered less than 60% of all errors. We thus conclude that while, to a significant extent, the character set used in a language determines distribution of spelling error, the purpose to which diacritics is employed in language also affect the distribution of spelling error in a language.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信