兴趣点搜索的自动同义词提取和基于上下文的查询重构

2023 IEEE 39th International Conference on Data Engineering (ICDE) Pub Date : 2023-04-01 DOI:10.1109/ICDE55515.2023.00235

Pengfei Li, Gaurav

{"title":"兴趣点搜索的自动同义词提取和基于上下文的查询重构","authors":"Pengfei Li, Gaurav","doi":"10.1109/ICDE55515.2023.00235","DOIUrl":null,"url":null,"abstract":"In modern search engines, synonyms are widely used for query reformulation to improve search recall and relevance. The search query is reformulated based on the synonymous terms that are semantically related to the original query. The reformulated queries are used for improving or augmenting the original query to retrieve more relevant results. However, there are four main challenges in production environments: (1) How to obtain high-quality synonyms and validate their effectiveness, especially for low-resource languagesƒ (2) How to prevent search intent drift caused by over-reformulating the correct queryƒ (3) How to efficiently keep the synonyms and models up-to-date for large-scale production systemsƒ (4) How to ensure the latency introduced by query reformulation does not affect user’s search experienceƒ In this paper, we address these challenges by introducing a context-based query reformulation system for Points-of-Interest (POI) search based on the synonyms automatically extracted from search logs and language models. The synonyms are automatically validated using historical query samples. We also propose a lightweight term identification model to prevent over-reformulation by considering query context during reformulation. The proposed methods are unsupervised/self-supervised that can be easily applied to large-scale production systems. We deploy our system to eight Southeast Asia countries that have both English and low-resource languages. Both offline evaluations and online A/B tests show that our system enhances search recall and relevance significantly.","PeriodicalId":434744,"journal":{"name":"2023 IEEE 39th International Conference on Data Engineering (ICDE)","volume":"61 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Automatic Synonym Extraction and Context-based Query Reformulation for Points-of-Interest Search\",\"authors\":\"Pengfei Li, Gaurav\",\"doi\":\"10.1109/ICDE55515.2023.00235\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In modern search engines, synonyms are widely used for query reformulation to improve search recall and relevance. The search query is reformulated based on the synonymous terms that are semantically related to the original query. The reformulated queries are used for improving or augmenting the original query to retrieve more relevant results. However, there are four main challenges in production environments: (1) How to obtain high-quality synonyms and validate their effectiveness, especially for low-resource languagesƒ (2) How to prevent search intent drift caused by over-reformulating the correct queryƒ (3) How to efficiently keep the synonyms and models up-to-date for large-scale production systemsƒ (4) How to ensure the latency introduced by query reformulation does not affect user’s search experienceƒ In this paper, we address these challenges by introducing a context-based query reformulation system for Points-of-Interest (POI) search based on the synonyms automatically extracted from search logs and language models. The synonyms are automatically validated using historical query samples. We also propose a lightweight term identification model to prevent over-reformulation by considering query context during reformulation. The proposed methods are unsupervised/self-supervised that can be easily applied to large-scale production systems. We deploy our system to eight Southeast Asia countries that have both English and low-resource languages. Both offline evaluations and online A/B tests show that our system enhances search recall and relevance significantly.\",\"PeriodicalId\":434744,\"journal\":{\"name\":\"2023 IEEE 39th International Conference on Data Engineering (ICDE)\",\"volume\":\"61 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-04-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2023 IEEE 39th International Conference on Data Engineering (ICDE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICDE55515.2023.00235\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE 39th International Conference on Data Engineering (ICDE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDE55515.2023.00235","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

在现代搜索引擎中，同义词被广泛用于查询改写，以提高搜索的召回率和相关性。搜索查询将根据与原始查询在语义上相关的同义术语重新表述。重新定义的查询用于改进或扩展原始查询，以检索更相关的结果。然而，在生产环境中存在四个主要挑战:(1)如何获得高质量的同义词并验证其有效性，特别是对于低资源语言(2)如何防止由于过度改写正确的查询而导致的搜索意图漂移(3)如何有效地保持大规模生产系统的同义词和模型的最新性(4)如何确保查询改写带来的延迟不影响用户的搜索体验(1)我们通过引入基于上下文的查询重构系统来解决这些挑战，该系统基于从搜索日志和语言模型中自动提取的同义词，用于兴趣点(POI)搜索。使用历史查询示例自动验证同义词。我们还提出了一个轻量级的术语识别模型，通过在重新表述过程中考虑查询上下文来防止过度重新表述。所提出的方法是无监督/自监督的，可以很容易地应用于大规模生产系统。我们将我们的系统部署到八个东南亚国家，这些国家既有英语，也有资源匮乏的语言。离线评估和在线A/B测试都表明，我们的系统显著提高了搜索召回率和相关性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Automatic Synonym Extraction and Context-based Query Reformulation for Points-of-Interest Search

In modern search engines, synonyms are widely used for query reformulation to improve search recall and relevance. The search query is reformulated based on the synonymous terms that are semantically related to the original query. The reformulated queries are used for improving or augmenting the original query to retrieve more relevant results. However, there are four main challenges in production environments: (1) How to obtain high-quality synonyms and validate their effectiveness, especially for low-resource languagesƒ (2) How to prevent search intent drift caused by over-reformulating the correct queryƒ (3) How to efficiently keep the synonyms and models up-to-date for large-scale production systemsƒ (4) How to ensure the latency introduced by query reformulation does not affect user’s search experienceƒ In this paper, we address these challenges by introducing a context-based query reformulation system for Points-of-Interest (POI) search based on the synonyms automatically extracted from search logs and language models. The synonyms are automatically validated using historical query samples. We also propose a lightweight term identification model to prevent over-reformulation by considering query context during reformulation. The proposed methods are unsupervised/self-supervised that can be easily applied to large-scale production systems. We deploy our system to eight Southeast Asia countries that have both English and low-resource languages. Both offline evaluations and online A/B tests show that our system enhances search recall and relevance significantly.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2023 IEEE 39th International Conference on Data Engineering (ICDE)

自引率

0.00%

发文量