{"title":"基于字符数的Damerau-Levenshtein距离拼写检查优化的字典分布","authors":"U. Pujianto, A. Wibawa, Raditha Ulfah","doi":"10.1109/ICSITech49800.2020.9392059","DOIUrl":null,"url":null,"abstract":"Damerau-Levenshtein Distance is an algorithm that can solve word correction problems. This algorithm changes one word into another word using a specified set of edit operations. In word correction using Damerau-Levenshtein Distance, edit operations that can be performed are: substitution, insertion, deletion and transposition. However, the Damerau-Levenshtein Distance algorithm also has a weakness, which is a long processing time. In order for the system to be able to display word suggestions on the wrong string, the system must calculate the word with each word in the dictionary. The processing time will be longer if the dictionary used is very large, for example, the Indonesian Dictionary has more than 30,000 basic words. So that in this study, a dictionary distribution based on the number of characters to shorten the processing time. The use of a distributed dictionary speeds up the Damerau-Levenshtein Distance algorithm by 29.04 seconds.","PeriodicalId":408532,"journal":{"name":"2020 6th International Conference on Science in Information Technology (ICSITech)","volume":"120 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Dictionary Distribution Based on Number of Characters for Damerau-Levenshtein Distance Spell Checker Optimization\",\"authors\":\"U. Pujianto, A. Wibawa, Raditha Ulfah\",\"doi\":\"10.1109/ICSITech49800.2020.9392059\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Damerau-Levenshtein Distance is an algorithm that can solve word correction problems. This algorithm changes one word into another word using a specified set of edit operations. In word correction using Damerau-Levenshtein Distance, edit operations that can be performed are: substitution, insertion, deletion and transposition. However, the Damerau-Levenshtein Distance algorithm also has a weakness, which is a long processing time. In order for the system to be able to display word suggestions on the wrong string, the system must calculate the word with each word in the dictionary. The processing time will be longer if the dictionary used is very large, for example, the Indonesian Dictionary has more than 30,000 basic words. So that in this study, a dictionary distribution based on the number of characters to shorten the processing time. The use of a distributed dictionary speeds up the Damerau-Levenshtein Distance algorithm by 29.04 seconds.\",\"PeriodicalId\":408532,\"journal\":{\"name\":\"2020 6th International Conference on Science in Information Technology (ICSITech)\",\"volume\":\"120 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-10-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 6th International Conference on Science in Information Technology (ICSITech)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICSITech49800.2020.9392059\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 6th International Conference on Science in Information Technology (ICSITech)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSITech49800.2020.9392059","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Dictionary Distribution Based on Number of Characters for Damerau-Levenshtein Distance Spell Checker Optimization
Damerau-Levenshtein Distance is an algorithm that can solve word correction problems. This algorithm changes one word into another word using a specified set of edit operations. In word correction using Damerau-Levenshtein Distance, edit operations that can be performed are: substitution, insertion, deletion and transposition. However, the Damerau-Levenshtein Distance algorithm also has a weakness, which is a long processing time. In order for the system to be able to display word suggestions on the wrong string, the system must calculate the word with each word in the dictionary. The processing time will be longer if the dictionary used is very large, for example, the Indonesian Dictionary has more than 30,000 basic words. So that in this study, a dictionary distribution based on the number of characters to shorten the processing time. The use of a distributed dictionary speeds up the Damerau-Levenshtein Distance algorithm by 29.04 seconds.