Semantic Models for the First-Stage Retrieval: A Comprehensive Review

ACM Transactions on Information Systems (TOIS) Pub Date : 2021-03-08 DOI:10.1145/3486250

Yinqiong Cai, Yixing Fan, Jiafeng Guo, Fei Sun, Ruqing Zhang, Xueqi Cheng

{"title":"Semantic Models for the First-Stage Retrieval: A Comprehensive Review","authors":"Yinqiong Cai, Yixing Fan, Jiafeng Guo, Fei Sun, Ruqing Zhang, Xueqi Cheng","doi":"10.1145/3486250","DOIUrl":null,"url":null,"abstract":"Multi-stage ranking pipelines have been a practical solution in modern search systems, where the first-stage retrieval is to return a subset of candidate documents and latter stages attempt to re-rank those candidates. Unlike re-ranking stages going through quick technique shifts over the past decades, the first-stage retrieval has long been dominated by classical term-based models. Unfortunately, these models suffer from the vocabulary mismatch problem, which may block re-ranking stages from relevant documents at the very beginning. Therefore, it has been a long-term desire to build semantic models for the first-stage retrieval that can achieve high recall efficiently. Recently, we have witnessed an explosive growth of research interests on the first-stage semantic retrieval models. We believe it is the right time to survey current status, learn from existing methods, and gain some insights for future development. In this article, we describe the current landscape of the first-stage retrieval models under a unified framework to clarify the connection between classical term-based retrieval methods, early semantic retrieval methods, and neural semantic retrieval methods. Moreover, we identify some open challenges and envision some future directions, with the hope of inspiring more research on these important yet less investigated topics.","PeriodicalId":6934,"journal":{"name":"ACM Transactions on Information Systems (TOIS)","volume":"183 1","pages":"1 - 42"},"PeriodicalIF":0.0000,"publicationDate":"2021-03-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"68","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Information Systems (TOIS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3486250","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 68

Abstract

Multi-stage ranking pipelines have been a practical solution in modern search systems, where the first-stage retrieval is to return a subset of candidate documents and latter stages attempt to re-rank those candidates. Unlike re-ranking stages going through quick technique shifts over the past decades, the first-stage retrieval has long been dominated by classical term-based models. Unfortunately, these models suffer from the vocabulary mismatch problem, which may block re-ranking stages from relevant documents at the very beginning. Therefore, it has been a long-term desire to build semantic models for the first-stage retrieval that can achieve high recall efficiently. Recently, we have witnessed an explosive growth of research interests on the first-stage semantic retrieval models. We believe it is the right time to survey current status, learn from existing methods, and gain some insights for future development. In this article, we describe the current landscape of the first-stage retrieval models under a unified framework to clarify the connection between classical term-based retrieval methods, early semantic retrieval methods, and neural semantic retrieval methods. Moreover, we identify some open challenges and envision some future directions, with the hope of inspiring more research on these important yet less investigated topics.

查看原文本刊更多论文

第一阶段检索的语义模型:综述

多阶段排序管道在现代搜索系统中已经成为一种实用的解决方案，其中第一阶段检索是返回候选文档的子集，后一阶段尝试对这些候选文档重新排序。与过去几十年快速技术转换的重新排序阶段不同，第一阶段检索长期以来一直由经典的基于术语的模型主导。不幸的是，这些模型存在词汇表不匹配问题，这可能会在一开始就阻碍相关文档的重新排序阶段。因此，为第一阶段检索建立能够有效实现高召回率的语义模型一直是一个长期的愿望。近年来，人们对第一阶段语义检索模型的研究兴趣呈爆炸式增长。我们认为，现在正是审视现状、借鉴现有方法、为未来发展提供一些启示的好时机。在本文中，我们在一个统一的框架下描述了第一阶段检索模型的现状，以澄清经典的基于术语的检索方法、早期语义检索方法和神经语义检索方法之间的联系。此外，我们确定了一些开放的挑战，并设想了一些未来的方向，希望在这些重要但研究较少的主题上激发更多的研究。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

ACM Transactions on Information Systems (TOIS)

自引率

0.00%

发文量