Dictionary Distribution Based on Number of Characters for Damerau-Levenshtein Distance Spell Checker Optimization

U. Pujianto, A. Wibawa, Raditha Ulfah
{"title":"Dictionary Distribution Based on Number of Characters for Damerau-Levenshtein Distance Spell Checker Optimization","authors":"U. Pujianto, A. Wibawa, Raditha Ulfah","doi":"10.1109/ICSITech49800.2020.9392059","DOIUrl":null,"url":null,"abstract":"Damerau-Levenshtein Distance is an algorithm that can solve word correction problems. This algorithm changes one word into another word using a specified set of edit operations. In word correction using Damerau-Levenshtein Distance, edit operations that can be performed are: substitution, insertion, deletion and transposition. However, the Damerau-Levenshtein Distance algorithm also has a weakness, which is a long processing time. In order for the system to be able to display word suggestions on the wrong string, the system must calculate the word with each word in the dictionary. The processing time will be longer if the dictionary used is very large, for example, the Indonesian Dictionary has more than 30,000 basic words. So that in this study, a dictionary distribution based on the number of characters to shorten the processing time. The use of a distributed dictionary speeds up the Damerau-Levenshtein Distance algorithm by 29.04 seconds.","PeriodicalId":408532,"journal":{"name":"2020 6th International Conference on Science in Information Technology (ICSITech)","volume":"120 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 6th International Conference on Science in Information Technology (ICSITech)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSITech49800.2020.9392059","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Damerau-Levenshtein Distance is an algorithm that can solve word correction problems. This algorithm changes one word into another word using a specified set of edit operations. In word correction using Damerau-Levenshtein Distance, edit operations that can be performed are: substitution, insertion, deletion and transposition. However, the Damerau-Levenshtein Distance algorithm also has a weakness, which is a long processing time. In order for the system to be able to display word suggestions on the wrong string, the system must calculate the word with each word in the dictionary. The processing time will be longer if the dictionary used is very large, for example, the Indonesian Dictionary has more than 30,000 basic words. So that in this study, a dictionary distribution based on the number of characters to shorten the processing time. The use of a distributed dictionary speeds up the Damerau-Levenshtein Distance algorithm by 29.04 seconds.
基于字符数的Damerau-Levenshtein距离拼写检查优化的字典分布
Damerau-Levenshtein Distance是一种可以解决单词纠错问题的算法。该算法使用一组指定的编辑操作将一个单词更改为另一个单词。在Damerau-Levenshtein Distance纠错中,可以执行的编辑操作有:替换、插入、删除和换位。然而,Damerau-Levenshtein距离算法也有一个缺点,即处理时间长。为了使系统能够在错误的字符串上显示单词建议,系统必须使用字典中的每个单词计算单词。如果使用的词典很大,处理时间会更长,例如印尼语词典有3万多个基本单词。从而在本研究中,采用基于字符数量的字典分布来缩短处理时间。分布式字典的使用将Damerau-Levenshtein Distance算法的速度提高了29.04秒。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信