{"title":"Survey of Query correction for Thai business-oriented information retrieval","authors":"Phongsathorn Kittiworapanya, Nuttapong Saelek, Anuruth Lertpiya, Tawunrat Chalothorn","doi":"10.1109/iSAI-NLP51646.2020.9376809","DOIUrl":null,"url":null,"abstract":"The importance of effective Thai information retrieval (IR) increases as more businesses in Thailand undergo digital transformation. However, previous research on Thai IR systems has mainly focused on web search engines. This study will focus on using query correction to reduce user errors to improve Thai IR. Experiments are conducted on our business-oriented Thai IR task (bTIR). Our investigation presented three notable findings. First, cognitive errors are less of an issue in a business setting. Thus, homophones correction methods provide very little to no benefit for bTIR. Second, approximation based spelling correction methods can significantly reduce search performance. Thus, partial matching on a full dictionary, such as symmetric delete indexing (SymSpell), should be preferred over non-optimal search methods. Third, we introduce a re-ranking algorithm for query corrector, which features multiple sub-correctors (e.g., ThaiQCor 2.0), which results in better performance across multiple configurations.","PeriodicalId":311014,"journal":{"name":"2020 15th International Joint Symposium on Artificial Intelligence and Natural Language Processing (iSAI-NLP)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 15th International Joint Symposium on Artificial Intelligence and Natural Language Processing (iSAI-NLP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/iSAI-NLP51646.2020.9376809","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
The importance of effective Thai information retrieval (IR) increases as more businesses in Thailand undergo digital transformation. However, previous research on Thai IR systems has mainly focused on web search engines. This study will focus on using query correction to reduce user errors to improve Thai IR. Experiments are conducted on our business-oriented Thai IR task (bTIR). Our investigation presented three notable findings. First, cognitive errors are less of an issue in a business setting. Thus, homophones correction methods provide very little to no benefit for bTIR. Second, approximation based spelling correction methods can significantly reduce search performance. Thus, partial matching on a full dictionary, such as symmetric delete indexing (SymSpell), should be preferred over non-optimal search methods. Third, we introduce a re-ranking algorithm for query corrector, which features multiple sub-correctors (e.g., ThaiQCor 2.0), which results in better performance across multiple configurations.