An incremental approach to efficient pseudo-relevance feedback

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval Pub Date : 2013-07-28 DOI:10.1145/2484028.2484051

Hao Wu, Hui Fang

{"title":"An incremental approach to efficient pseudo-relevance feedback","authors":"Hao Wu, Hui Fang","doi":"10.1145/2484028.2484051","DOIUrl":null,"url":null,"abstract":"Pseudo-relevance feedback is an important strategy to improve search accuracy. It is often implemented as a two-round retrieval process: the first round is to retrieve an initial set of documents relevant to an original query, and the second round is to retrieve final retrieval results using the original query expanded with terms selected from the previously retrieved documents. This two-round retrieval process is clearly time consuming, which could arguably be one of main reasons that hinder the wide adaptation of the pseudo-relevance feedback methods in real-world IR systems. In this paper, we study how to improve the efficiency of pseudo-relevance feedback methods. The basic idea is to reduce the time needed for the second round of retrieval by leveraging the query processing results of the first round. Specifically, instead of processing the expand query as a newly submitted query, we propose an incremental approach, which resumes the query processing results (i.e. document accumulators) for the first round of retrieval and process the second round of retrieval mainly as a step of adjusting the scores in the accumulators. Experimental results on TREC Terabyte collections show that the proposed incremental approach can improve the efficiency of pseudo-relevance feedback methods by a factor of two without sacrificing their effectiveness.","PeriodicalId":178818,"journal":{"name":"Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval","volume":"32 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"23","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2484028.2484051","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 23

Abstract

Pseudo-relevance feedback is an important strategy to improve search accuracy. It is often implemented as a two-round retrieval process: the first round is to retrieve an initial set of documents relevant to an original query, and the second round is to retrieve final retrieval results using the original query expanded with terms selected from the previously retrieved documents. This two-round retrieval process is clearly time consuming, which could arguably be one of main reasons that hinder the wide adaptation of the pseudo-relevance feedback methods in real-world IR systems. In this paper, we study how to improve the efficiency of pseudo-relevance feedback methods. The basic idea is to reduce the time needed for the second round of retrieval by leveraging the query processing results of the first round. Specifically, instead of processing the expand query as a newly submitted query, we propose an incremental approach, which resumes the query processing results (i.e. document accumulators) for the first round of retrieval and process the second round of retrieval mainly as a step of adjusting the scores in the accumulators. Experimental results on TREC Terabyte collections show that the proposed incremental approach can improve the efficiency of pseudo-relevance feedback methods by a factor of two without sacrificing their effectiveness.

查看原文本刊更多论文

一种有效伪相关反馈的增量方法

伪相关反馈是提高搜索精度的重要策略。它通常被实现为两轮检索过程:第一轮是检索与原始查询相关的一组初始文档，第二轮是检索使用从先前检索的文档中选择的术语展开的原始查询的最终检索结果。这种两轮检索过程明显耗时，这可能是阻碍伪相关反馈方法在现实红外系统中广泛应用的主要原因之一。本文研究了如何提高伪相关反馈方法的效率。基本思想是通过利用第一轮的查询处理结果来减少第二轮检索所需的时间。具体来说，我们提出了一种增量方法，将扩展查询作为新提交的查询来处理，即恢复第一轮检索的查询处理结果(即文档累加器)，并将第二轮检索主要作为调整累加器中的分数的步骤来处理。在TREC tb集合上的实验结果表明，该方法可以在不牺牲伪相关反馈方法有效性的前提下，将伪相关反馈方法的效率提高2倍。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval

自引率

0.00%

发文量