Relevance feedback for building pooled test collections

IF 1.8 4区 管理学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS
David Otero, Javier Parapar, Álvaro Barreiro
{"title":"Relevance feedback for building pooled test collections","authors":"David Otero, Javier Parapar, Álvaro Barreiro","doi":"10.1177/01655515231171085","DOIUrl":null,"url":null,"abstract":"Offline evaluation of information retrieval systems depends on test collections. These datasets provide the researchers with a corpus of documents, topics and relevance judgements indicating which documents are relevant for each topic. Gathering the latter is costly, requiring human assessors to judge the documents. Therefore, experts usually judge only a portion of the corpus. The most common approach for selecting that subset is pooling. By intelligently choosing which documents to assess, it is possible to optimise the number of positive labels for a given budget. For this reason, much work has focused on developing techniques to better select which documents from the corpus merit human assessments. In this article, we propose using relevance feedback to prioritise the documents when building new pooled test collections. We explore several state-of-the-art statistical feedback methods for prioritising the documents the algorithm presents to the assessors. A thorough comparison on eight Text Retrieval Conference (TREC) datasets against strong baselines shows that, among other results, our proposals improve in retrieving relevant documents with lower assessment effort than other state-of-the-art adjudicating methods without harming the reliability, fairness and reusability.","PeriodicalId":54796,"journal":{"name":"Journal of Information Science","volume":" ","pages":""},"PeriodicalIF":1.8000,"publicationDate":"2023-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Information Science","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1177/01655515231171085","RegionNum":4,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

Abstract

Offline evaluation of information retrieval systems depends on test collections. These datasets provide the researchers with a corpus of documents, topics and relevance judgements indicating which documents are relevant for each topic. Gathering the latter is costly, requiring human assessors to judge the documents. Therefore, experts usually judge only a portion of the corpus. The most common approach for selecting that subset is pooling. By intelligently choosing which documents to assess, it is possible to optimise the number of positive labels for a given budget. For this reason, much work has focused on developing techniques to better select which documents from the corpus merit human assessments. In this article, we propose using relevance feedback to prioritise the documents when building new pooled test collections. We explore several state-of-the-art statistical feedback methods for prioritising the documents the algorithm presents to the assessors. A thorough comparison on eight Text Retrieval Conference (TREC) datasets against strong baselines shows that, among other results, our proposals improve in retrieving relevant documents with lower assessment effort than other state-of-the-art adjudicating methods without harming the reliability, fairness and reusability.
用于构建池测试集合的相关反馈
信息检索系统的离线评估依赖于测试集合。这些数据集为研究人员提供了一个文档、主题和相关性判断的语料库,指示哪些文档与每个主题相关。收集后者成本高昂,需要人工评估人员对文件进行判断。因此,专家通常只判断语料库的一部分。选择该子集最常见的方法是池化。通过智能地选择要评估的文档,可以优化给定预算的正面标签数量。因此,许多工作都集中在开发技术上,以更好地从语料库中选择哪些文档值得人类评估。在本文中,我们建议在构建新的池测试集合时使用相关性反馈来对文档进行优先级排序。我们探索了几种最先进的统计反馈方法,用于对算法呈现给评估人员的文档进行优先级排序。将八个文本检索会议(TREC)数据集与强基线进行彻底比较表明,除其他结果外,与其他最先进的裁决方法相比,我们的提案在检索相关文件方面有所改进,评估工作量更低,而不会损害可靠性、公平性和可重用性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Journal of Information Science
Journal of Information Science 工程技术-计算机:信息系统
CiteScore
6.80
自引率
8.30%
发文量
121
审稿时长
4 months
期刊介绍: The Journal of Information Science is a peer-reviewed international journal of high repute covering topics of interest to all those researching and working in the sciences of information and knowledge management. The Editors welcome material on any aspect of information science theory, policy, application or practice that will advance thinking in the field.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信