A Comparison between Term-Independence Retrieval Models for Ad Hoc Retrieval

ACM Transactions on Information Systems (TOIS) Pub Date : 2021-12-08 DOI:10.1145/3483612

E. K. F. Dang, R. Luk, James Allan

{"title":"A Comparison between Term-Independence Retrieval Models for Ad Hoc Retrieval","authors":"E. K. F. Dang, R. Luk, James Allan","doi":"10.1145/3483612","DOIUrl":null,"url":null,"abstract":"In Information Retrieval, numerous retrieval models or document ranking functions have been developed in the quest for better retrieval effectiveness. Apart from some formal retrieval models formulated on a theoretical basis, various recent works have applied heuristic constraints to guide the derivation of document ranking functions. While many recent methods are shown to improve over established and successful models, comparison among these new methods under a common environment is often missing. To address this issue, we perform an extensive and up-to-date comparison of leading term-independence retrieval models implemented in our own retrieval system. Our study focuses on the following questions: (RQ1) Is there a retrieval model that consistently outperforms all other models across multiple collections; (RQ2) What are the important features of an effective document ranking function? Our retrieval experiments performed on several TREC test collections of a wide range of sizes (up to the terabyte-sized Clueweb09 Category B) enable us to answer these research questions. This work also serves as a reproducibility study for leading retrieval models. While our experiments show that no single retrieval model outperforms all others across all tested collections, some recent retrieval models, such as MATF and MVD, consistently perform better than the common baselines.","PeriodicalId":6934,"journal":{"name":"ACM Transactions on Information Systems (TOIS)","volume":"77 1","pages":"1 - 37"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Information Systems (TOIS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3483612","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

In Information Retrieval, numerous retrieval models or document ranking functions have been developed in the quest for better retrieval effectiveness. Apart from some formal retrieval models formulated on a theoretical basis, various recent works have applied heuristic constraints to guide the derivation of document ranking functions. While many recent methods are shown to improve over established and successful models, comparison among these new methods under a common environment is often missing. To address this issue, we perform an extensive and up-to-date comparison of leading term-independence retrieval models implemented in our own retrieval system. Our study focuses on the following questions: (RQ1) Is there a retrieval model that consistently outperforms all other models across multiple collections; (RQ2) What are the important features of an effective document ranking function? Our retrieval experiments performed on several TREC test collections of a wide range of sizes (up to the terabyte-sized Clueweb09 Category B) enable us to answer these research questions. This work also serves as a reproducibility study for leading retrieval models. While our experiments show that no single retrieval model outperforms all others across all tested collections, some recent retrieval models, such as MATF and MVD, consistently perform better than the common baselines.

查看原文本刊更多论文

自组织检索中词无关检索模型的比较

在信息检索中，为了提高检索效率，开发了许多检索模型或文档排序函数。除了一些在理论基础上制定的正式检索模型外，最近的各种工作都应用启发式约束来指导文档排序函数的推导。虽然许多最近的方法被证明是对已建立和成功的模型的改进，但在共同环境下对这些新方法的比较往往是缺失的。为了解决这个问题，我们对在我们自己的检索系统中实现的主要术语独立检索模型进行了广泛和最新的比较。我们的研究主要集中在以下问题上:(RQ1)是否存在一个检索模型在多个集合中始终优于所有其他模型;(RQ2)有效的文档排序功能的重要特征是什么?我们在几个不同大小的TREC测试集合上进行的检索实验(高达tb大小的Clueweb09 B类)使我们能够回答这些研究问题。这项工作也可作为主要检索模型的可重复性研究。虽然我们的实验表明，在所有被测试的集合中，没有一个检索模型优于所有其他模型，但是一些最近的检索模型，如MATF和MVD，始终比公共基线表现得更好。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

ACM Transactions on Information Systems (TOIS)

自引率

0.00%

发文量