利用预训练模型改进兴趣点搜索的第一阶段检索

IF 9.1 2区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

ACM Transactions on Information Systems Pub Date : 2023-11-07 DOI:10.1145/3631937

Lang Mei, Jiaxin Mao, Juan Hu, Naiqiang Tan, Hua Chai, Ji-rong Wen

{"title":"利用预训练模型改进兴趣点搜索的第一阶段检索","authors":"Lang Mei, Jiaxin Mao, Juan Hu, Naiqiang Tan, Hua Chai, Ji-rong Wen","doi":"10.1145/3631937","DOIUrl":null,"url":null,"abstract":"Point-of-interest (POI) search is important for location-based services, such as navigation and online ride-hailing service. The goal of POI search is to find the most relevant destinations from a large-scale POI database given a text query. To improve the effectiveness and efficiency of POI search, most existing approaches are based on a multi-stage pipeline that consists of an efficiency-oriented retrieval stage and one or more effectiveness-oriented re-rank stages. In this paper, we focus on the first efficiency-oriented retrieval stage of the POI search. We first identify the limitations of existing first-stage POI retrieval models in capturing the semantic-geography relationship and modeling the fine-grained geographical context information. Then, we propose a Geo-Enhanced Dense Retrieval framework for POI search to alleviate the above problems. Specifically, the proposed framework leverages the capacity of pre-trained language models (e.g., BERT) and designs a pre-training approach to better model the semantic match between the query prefix and POIs. With the POI collection, we first perform a token-level pre-training task based on a geographical-sensitive masked language prediction, and design two retrieval-oriented pre-training tasks that link the address of each POI to its name and geo-location. With the user behavior logs collected from an online POI search system, we design two additional pre-training tasks based on users’ query reformulation behavior and the transitions between POIs. We also utilize a late-interaction network structure to model the fine-grained interactions between the text and geographical context information within an acceptable query latency. Extensive experiments on the real-world datasets collected from the Didichuxing application demonstrate that the proposed framework can achieve superior retrieval performance over existing first-stage POI retrieval methods.","PeriodicalId":50936,"journal":{"name":"ACM Transactions on Information Systems","volume":"235 1‐3","pages":"0"},"PeriodicalIF":9.1000,"publicationDate":"2023-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Improving First-stage Retrieval of Point-of-Interest Search by Pre-training Models\",\"authors\":\"Lang Mei, Jiaxin Mao, Juan Hu, Naiqiang Tan, Hua Chai, Ji-rong Wen\",\"doi\":\"10.1145/3631937\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Point-of-interest (POI) search is important for location-based services, such as navigation and online ride-hailing service. The goal of POI search is to find the most relevant destinations from a large-scale POI database given a text query. To improve the effectiveness and efficiency of POI search, most existing approaches are based on a multi-stage pipeline that consists of an efficiency-oriented retrieval stage and one or more effectiveness-oriented re-rank stages. In this paper, we focus on the first efficiency-oriented retrieval stage of the POI search. We first identify the limitations of existing first-stage POI retrieval models in capturing the semantic-geography relationship and modeling the fine-grained geographical context information. Then, we propose a Geo-Enhanced Dense Retrieval framework for POI search to alleviate the above problems. Specifically, the proposed framework leverages the capacity of pre-trained language models (e.g., BERT) and designs a pre-training approach to better model the semantic match between the query prefix and POIs. With the POI collection, we first perform a token-level pre-training task based on a geographical-sensitive masked language prediction, and design two retrieval-oriented pre-training tasks that link the address of each POI to its name and geo-location. With the user behavior logs collected from an online POI search system, we design two additional pre-training tasks based on users’ query reformulation behavior and the transitions between POIs. We also utilize a late-interaction network structure to model the fine-grained interactions between the text and geographical context information within an acceptable query latency. Extensive experiments on the real-world datasets collected from the Didichuxing application demonstrate that the proposed framework can achieve superior retrieval performance over existing first-stage POI retrieval methods.\",\"PeriodicalId\":50936,\"journal\":{\"name\":\"ACM Transactions on Information Systems\",\"volume\":\"235 1‐3\",\"pages\":\"0\"},\"PeriodicalIF\":9.1000,\"publicationDate\":\"2023-11-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACM Transactions on Information Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3631937\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Information Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3631937","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

摘要

兴趣点(POI)搜索对于基于位置的服务很重要，比如导航和在线叫车服务。POI搜索的目标是从给定文本查询的大规模POI数据库中找到最相关的目的地。为了提高POI搜索的有效性和效率，大多数现有方法都基于多级管道，该管道由一个面向效率的检索阶段和一个或多个面向效率的重新排序阶段组成。在本文中，我们关注POI搜索的第一个面向效率的检索阶段。本文首先指出了现有第一阶段POI检索模型在捕获语义-地理关系和建模细粒度地理上下文信息方面的局限性。针对上述问题，提出了一种基于地理增强的密集检索框架。具体来说，所提出的框架利用了预训练语言模型(例如BERT)的能力，并设计了一种预训练方法来更好地建模查询前缀和poi之间的语义匹配。对于POI集合，我们首先基于地理敏感的掩码语言预测执行令牌级预训练任务，并设计两个面向检索的预训练任务，将每个POI的地址与其名称和地理位置联系起来。利用从在线POI搜索系统中收集的用户行为日志，我们基于用户的查询重新表述行为和POI之间的转换设计了两个额外的预训练任务。我们还利用后期交互网络结构在可接受的查询延迟内对文本和地理上下文信息之间的细粒度交互进行建模。在didihuxing应用中收集的真实数据集上进行的大量实验表明，所提出的框架比现有的第一阶段POI检索方法具有更好的检索性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Improving First-stage Retrieval of Point-of-Interest Search by Pre-training Models

Point-of-interest (POI) search is important for location-based services, such as navigation and online ride-hailing service. The goal of POI search is to find the most relevant destinations from a large-scale POI database given a text query. To improve the effectiveness and efficiency of POI search, most existing approaches are based on a multi-stage pipeline that consists of an efficiency-oriented retrieval stage and one or more effectiveness-oriented re-rank stages. In this paper, we focus on the first efficiency-oriented retrieval stage of the POI search. We first identify the limitations of existing first-stage POI retrieval models in capturing the semantic-geography relationship and modeling the fine-grained geographical context information. Then, we propose a Geo-Enhanced Dense Retrieval framework for POI search to alleviate the above problems. Specifically, the proposed framework leverages the capacity of pre-trained language models (e.g., BERT) and designs a pre-training approach to better model the semantic match between the query prefix and POIs. With the POI collection, we first perform a token-level pre-training task based on a geographical-sensitive masked language prediction, and design two retrieval-oriented pre-training tasks that link the address of each POI to its name and geo-location. With the user behavior logs collected from an online POI search system, we design two additional pre-training tasks based on users’ query reformulation behavior and the transitions between POIs. We also utilize a late-interaction network structure to model the fine-grained interactions between the text and geographical context information within an acceptable query latency. Extensive experiments on the real-world datasets collected from the Didichuxing application demonstrate that the proposed framework can achieve superior retrieval performance over existing first-stage POI retrieval methods.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

ACM Transactions on Information Systems 工程技术-计算机：信息系统

CiteScore

9.40

自引率

14.30%

发文量

165

审稿时长

>12 weeks

期刊介绍： The ACM Transactions on Information Systems (TOIS) publishes papers on information retrieval (such as search engines, recommender systems) that contain: new principled information retrieval models or algorithms with sound empirical validation; observational, experimental and/or theoretical studies yielding new insights into information retrieval or information seeking; accounts of applications of existing information retrieval techniques that shed light on the strengths and weaknesses of the techniques; formalization of new information retrieval or information seeking tasks and of methods for evaluating the performance on those tasks; development of content (text, image, speech, video, etc) analysis methods to support information retrieval and information seeking; development of computational models of user information preferences and interaction behaviors; creation and analysis of evaluation methodologies for information retrieval and information seeking; or surveys of existing work that propose a significant synthesis. The information retrieval scope of ACM Transactions on Information Systems (TOIS) appeals to industry practitioners for its wealth of creative ideas, and to academic researchers for its descriptions of their colleagues'' work.