Reproducibility Challenges in Information Retrieval Evaluation

N. Ferro
{"title":"Reproducibility Challenges in Information Retrieval Evaluation","authors":"N. Ferro","doi":"10.1145/3020206","DOIUrl":null,"url":null,"abstract":"Information Retrieval (IR) is concerned with ranking information resources with respect to user information needs, delivering a wide range of key applications for industry and society, such as Web search engines [Croft et al. 2009], intellectual property, and patent search [Lupu and Hanbury 2013], and many others. The performance of IR systems is determined not only by their efficiency but also and most importantly by their effectiveness, that is, their ability to retrieve and better rank relevant information resources while at the same time suppressing the retrieval of not relevant ones. Due to the many sources of uncertainty, as for example vague user information needs, unstructured information sources, or subjective notion of relevance, experimental evaluation is the only mean to assess the performances of IR systems from the effectiveness point of view. Experimental evaluation relies on the Cranfield paradigm, which makes use of experimental collections, consisting of documents, sampled from a real domain of interest; topics, representing real user information needs in that domain; and relevance judgements, determining which documents are relevant to which topics [Harman 2011]. To share the effort and optimize the use of resources, experimental evaluation is usually carried out in publicly open and large-scale evaluation campaigns at the international level, like the Text REtrieval Conference (TREC)1 in the United States [Harman and Voorhees 2005], the Conference and Labs of the Evaluation Forum (CLEF)2 in Europe [Ferro 2014], the NII Testbeds and Community for Information access Research (NTCIR)3 in Japan and Asia, and the Forum for Information Retrieval Evaluation (FIRE)4 in India. These initiatives produce, every year, huge amounts of scientific data","PeriodicalId":15582,"journal":{"name":"Journal of Data and Information Quality (JDIQ)","volume":"52 1","pages":"1 - 4"},"PeriodicalIF":0.0000,"publicationDate":"2017-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"49","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Data and Information Quality (JDIQ)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3020206","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 49

Abstract

Information Retrieval (IR) is concerned with ranking information resources with respect to user information needs, delivering a wide range of key applications for industry and society, such as Web search engines [Croft et al. 2009], intellectual property, and patent search [Lupu and Hanbury 2013], and many others. The performance of IR systems is determined not only by their efficiency but also and most importantly by their effectiveness, that is, their ability to retrieve and better rank relevant information resources while at the same time suppressing the retrieval of not relevant ones. Due to the many sources of uncertainty, as for example vague user information needs, unstructured information sources, or subjective notion of relevance, experimental evaluation is the only mean to assess the performances of IR systems from the effectiveness point of view. Experimental evaluation relies on the Cranfield paradigm, which makes use of experimental collections, consisting of documents, sampled from a real domain of interest; topics, representing real user information needs in that domain; and relevance judgements, determining which documents are relevant to which topics [Harman 2011]. To share the effort and optimize the use of resources, experimental evaluation is usually carried out in publicly open and large-scale evaluation campaigns at the international level, like the Text REtrieval Conference (TREC)1 in the United States [Harman and Voorhees 2005], the Conference and Labs of the Evaluation Forum (CLEF)2 in Europe [Ferro 2014], the NII Testbeds and Community for Information access Research (NTCIR)3 in Japan and Asia, and the Forum for Information Retrieval Evaluation (FIRE)4 in India. These initiatives produce, every year, huge amounts of scientific data
信息检索评价中的再现性挑战
信息检索(Information Retrieval, IR)关注的是根据用户信息需求对信息资源进行排序,为行业和社会提供广泛的关键应用,如网络搜索引擎[Croft等人,2009]、知识产权和专利检索[Lupu and Hanbury, 2013]等等。IR系统的性能不仅取决于其效率,而且最重要的是取决于其有效性,即检索和更好地排序相关信息资源的能力,同时抑制不相关信息的检索。由于存在许多不确定性来源,例如模糊的用户信息需求、非结构化的信息源或主观的相关性概念,实验评估是从有效性的角度评估红外系统性能的唯一手段。实验评估依赖于克兰菲尔德范式,它利用实验集合,由文件组成,从感兴趣的真实领域取样;主题,代表该领域的实际用户信息需求;以及相关性判断,确定哪些文件与哪些主题相关[Harman 2011]。为了共享成果和优化资源利用,实验评估通常在国际上公开开放和大规模的评估活动中进行,如美国的文本检索会议(TREC)1 [Harman and Voorhees 2005],欧洲评估论坛(CLEF)2的会议和实验室[Ferro 2014],日本和亚洲的NII测试平台和信息获取研究社区(NTCIR)3,以及印度的信息检索评估论坛(FIRE)4。这些举措每年都会产生大量的科学数据
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信