A Comparison between Term-Independence Retrieval Models for Ad Hoc Retrieval

E. K. F. Dang, R. Luk, James Allan
{"title":"A Comparison between Term-Independence Retrieval Models for Ad Hoc Retrieval","authors":"E. K. F. Dang, R. Luk, James Allan","doi":"10.1145/3483612","DOIUrl":null,"url":null,"abstract":"In Information Retrieval, numerous retrieval models or document ranking functions have been developed in the quest for better retrieval effectiveness. Apart from some formal retrieval models formulated on a theoretical basis, various recent works have applied heuristic constraints to guide the derivation of document ranking functions. While many recent methods are shown to improve over established and successful models, comparison among these new methods under a common environment is often missing. To address this issue, we perform an extensive and up-to-date comparison of leading term-independence retrieval models implemented in our own retrieval system. Our study focuses on the following questions: (RQ1) Is there a retrieval model that consistently outperforms all other models across multiple collections; (RQ2) What are the important features of an effective document ranking function? Our retrieval experiments performed on several TREC test collections of a wide range of sizes (up to the terabyte-sized Clueweb09 Category B) enable us to answer these research questions. This work also serves as a reproducibility study for leading retrieval models. While our experiments show that no single retrieval model outperforms all others across all tested collections, some recent retrieval models, such as MATF and MVD, consistently perform better than the common baselines.","PeriodicalId":6934,"journal":{"name":"ACM Transactions on Information Systems (TOIS)","volume":"77 1","pages":"1 - 37"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Information Systems (TOIS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3483612","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

In Information Retrieval, numerous retrieval models or document ranking functions have been developed in the quest for better retrieval effectiveness. Apart from some formal retrieval models formulated on a theoretical basis, various recent works have applied heuristic constraints to guide the derivation of document ranking functions. While many recent methods are shown to improve over established and successful models, comparison among these new methods under a common environment is often missing. To address this issue, we perform an extensive and up-to-date comparison of leading term-independence retrieval models implemented in our own retrieval system. Our study focuses on the following questions: (RQ1) Is there a retrieval model that consistently outperforms all other models across multiple collections; (RQ2) What are the important features of an effective document ranking function? Our retrieval experiments performed on several TREC test collections of a wide range of sizes (up to the terabyte-sized Clueweb09 Category B) enable us to answer these research questions. This work also serves as a reproducibility study for leading retrieval models. While our experiments show that no single retrieval model outperforms all others across all tested collections, some recent retrieval models, such as MATF and MVD, consistently perform better than the common baselines.
自组织检索中词无关检索模型的比较
在信息检索中,为了提高检索效率,开发了许多检索模型或文档排序函数。除了一些在理论基础上制定的正式检索模型外,最近的各种工作都应用启发式约束来指导文档排序函数的推导。虽然许多最近的方法被证明是对已建立和成功的模型的改进,但在共同环境下对这些新方法的比较往往是缺失的。为了解决这个问题,我们对在我们自己的检索系统中实现的主要术语独立检索模型进行了广泛和最新的比较。我们的研究主要集中在以下问题上:(RQ1)是否存在一个检索模型在多个集合中始终优于所有其他模型;(RQ2)有效的文档排序功能的重要特征是什么?我们在几个不同大小的TREC测试集合上进行的检索实验(高达tb大小的Clueweb09 B类)使我们能够回答这些研究问题。这项工作也可作为主要检索模型的可重复性研究。虽然我们的实验表明,在所有被测试的集合中,没有一个检索模型优于所有其他模型,但是一些最近的检索模型,如MATF和MVD,始终比公共基线表现得更好。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信