Santipong Thaiprayoon, A. Kongthon, C. Haruechaiyasak
{"title":"ThaiQCor 2.0: Thai Query Correction via Soundex and Word Approximation","authors":"Santipong Thaiprayoon, A. Kongthon, C. Haruechaiyasak","doi":"10.1109/ICAICTA.2018.8541321","DOIUrl":null,"url":null,"abstract":"Nowadays, search engine is an important tool for enabling users to search for information on the Internet. One of the most important problems of searching is inaccurate typing due to typographical and cognitive errors. Typographical errors are normally resulting from typing mistakes from adjacent letters on a keyboard layout. Cognitive errors are due to the lack of user knowledge in query term spelling. To solve the problems, we designed and developed a new version of Thai query correction program called ThaiQCor 2.0 that can handle both typographical and cognitive errors. Our program consists of two main approaches, word approximation and soundex. Word approximation employs the approximate string retrieval technique including character edit distance calculation. This approach aims to solve the typographical errors. Soundex applies the grapheme-to-phoneme conversion and then performs string matching approximation by calculating the edit distance of weighted phonemes from phoneme sequences. The objective of this approach is to handle the cognitive errors. All candidate words from both approaches are ranked based on their scores and suggested to the user. The experimental results showed that ThaiQCor 2.0 achieves the accuracy of 97.11% and 89.76% for place names and person names, respectively.","PeriodicalId":184882,"journal":{"name":"2018 5th International Conference on Advanced Informatics: Concept Theory and Applications (ICAICTA)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 5th International Conference on Advanced Informatics: Concept Theory and Applications (ICAICTA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICAICTA.2018.8541321","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
Nowadays, search engine is an important tool for enabling users to search for information on the Internet. One of the most important problems of searching is inaccurate typing due to typographical and cognitive errors. Typographical errors are normally resulting from typing mistakes from adjacent letters on a keyboard layout. Cognitive errors are due to the lack of user knowledge in query term spelling. To solve the problems, we designed and developed a new version of Thai query correction program called ThaiQCor 2.0 that can handle both typographical and cognitive errors. Our program consists of two main approaches, word approximation and soundex. Word approximation employs the approximate string retrieval technique including character edit distance calculation. This approach aims to solve the typographical errors. Soundex applies the grapheme-to-phoneme conversion and then performs string matching approximation by calculating the edit distance of weighted phonemes from phoneme sequences. The objective of this approach is to handle the cognitive errors. All candidate words from both approaches are ranked based on their scores and suggested to the user. The experimental results showed that ThaiQCor 2.0 achieves the accuracy of 97.11% and 89.76% for place names and person names, respectively.