{"title":"Automatic Synonym Extraction and Context-based Query Reformulation for Points-of-Interest Search","authors":"Pengfei Li, Gaurav","doi":"10.1109/ICDE55515.2023.00235","DOIUrl":null,"url":null,"abstract":"In modern search engines, synonyms are widely used for query reformulation to improve search recall and relevance. The search query is reformulated based on the synonymous terms that are semantically related to the original query. The reformulated queries are used for improving or augmenting the original query to retrieve more relevant results. However, there are four main challenges in production environments: (1) How to obtain high-quality synonyms and validate their effectiveness, especially for low-resource languagesƒ (2) How to prevent search intent drift caused by over-reformulating the correct queryƒ (3) How to efficiently keep the synonyms and models up-to-date for large-scale production systemsƒ (4) How to ensure the latency introduced by query reformulation does not affect user’s search experienceƒ In this paper, we address these challenges by introducing a context-based query reformulation system for Points-of-Interest (POI) search based on the synonyms automatically extracted from search logs and language models. The synonyms are automatically validated using historical query samples. We also propose a lightweight term identification model to prevent over-reformulation by considering query context during reformulation. The proposed methods are unsupervised/self-supervised that can be easily applied to large-scale production systems. We deploy our system to eight Southeast Asia countries that have both English and low-resource languages. Both offline evaluations and online A/B tests show that our system enhances search recall and relevance significantly.","PeriodicalId":434744,"journal":{"name":"2023 IEEE 39th International Conference on Data Engineering (ICDE)","volume":"61 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE 39th International Conference on Data Engineering (ICDE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDE55515.2023.00235","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
In modern search engines, synonyms are widely used for query reformulation to improve search recall and relevance. The search query is reformulated based on the synonymous terms that are semantically related to the original query. The reformulated queries are used for improving or augmenting the original query to retrieve more relevant results. However, there are four main challenges in production environments: (1) How to obtain high-quality synonyms and validate their effectiveness, especially for low-resource languagesƒ (2) How to prevent search intent drift caused by over-reformulating the correct queryƒ (3) How to efficiently keep the synonyms and models up-to-date for large-scale production systemsƒ (4) How to ensure the latency introduced by query reformulation does not affect user’s search experienceƒ In this paper, we address these challenges by introducing a context-based query reformulation system for Points-of-Interest (POI) search based on the synonyms automatically extracted from search logs and language models. The synonyms are automatically validated using historical query samples. We also propose a lightweight term identification model to prevent over-reformulation by considering query context during reformulation. The proposed methods are unsupervised/self-supervised that can be easily applied to large-scale production systems. We deploy our system to eight Southeast Asia countries that have both English and low-resource languages. Both offline evaluations and online A/B tests show that our system enhances search recall and relevance significantly.