Reproduce and Improve

Kevin Roitero, Michael Soprano, Andrea Brunello, Stefano Mizzaro
{"title":"Reproduce and Improve","authors":"Kevin Roitero, Michael Soprano, Andrea Brunello, Stefano Mizzaro","doi":"10.1145/3239573","DOIUrl":null,"url":null,"abstract":"Effectiveness evaluation of information retrieval systems by means of a test collection is a widely used methodology. However, it is rather expensive in terms of resources, time, and money; therefore, many researchers have proposed methods for a cheaper evaluation. One particular approach, on which we focus in this article, is to use fewer topics: in TREC-like initiatives, usually system effectiveness is evaluated as the average effectiveness on a set of n topics (usually, n=50, but more than 1,000 have been also adopted); instead of using the full set, it has been proposed to find the best subsets of a few good topics that evaluate the systems in the most similar way to the full set. The computational complexity of the task has so far limited the analysis that has been performed. We develop a novel and efficient approach based on a multi-objective evolutionary algorithm. The higher efficiency of our new implementation allows us to reproduce some notable results on topic set reduction, as well as perform new experiments to generalize and improve such results. We show that our approach is able to both reproduce the main state-of-the-art results and to allow us to analyze the effect of the collection, metric, and pool depth used for the evaluation. Finally, differently from previous studies, which have been mainly theoretical, we are also able to discuss some practical topic selection strategies, integrating results of automatic evaluation approaches.","PeriodicalId":15582,"journal":{"name":"Journal of Data and Information Quality (JDIQ)","volume":"27 1","pages":"1 - 21"},"PeriodicalIF":0.0000,"publicationDate":"2018-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Data and Information Quality (JDIQ)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3239573","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7

Abstract

Effectiveness evaluation of information retrieval systems by means of a test collection is a widely used methodology. However, it is rather expensive in terms of resources, time, and money; therefore, many researchers have proposed methods for a cheaper evaluation. One particular approach, on which we focus in this article, is to use fewer topics: in TREC-like initiatives, usually system effectiveness is evaluated as the average effectiveness on a set of n topics (usually, n=50, but more than 1,000 have been also adopted); instead of using the full set, it has been proposed to find the best subsets of a few good topics that evaluate the systems in the most similar way to the full set. The computational complexity of the task has so far limited the analysis that has been performed. We develop a novel and efficient approach based on a multi-objective evolutionary algorithm. The higher efficiency of our new implementation allows us to reproduce some notable results on topic set reduction, as well as perform new experiments to generalize and improve such results. We show that our approach is able to both reproduce the main state-of-the-art results and to allow us to analyze the effect of the collection, metric, and pool depth used for the evaluation. Finally, differently from previous studies, which have been mainly theoretical, we are also able to discuss some practical topic selection strategies, integrating results of automatic evaluation approaches.
再生产和改进
利用测试集对信息检索系统进行有效性评估是一种广泛使用的方法。然而,在资源、时间和金钱方面,这是相当昂贵的;因此,许多研究人员提出了更便宜的评估方法。我们在本文中关注的一种特殊方法是使用更少的主题:在类似trec的计划中,系统有效性通常被评估为一组n个主题的平均有效性(通常,n=50,但也采用了1000多个);与其使用完整集,还不如找到几个好主题的最佳子集,这些主题以与完整集最相似的方式评估系统。到目前为止,该任务的计算复杂性限制了已执行的分析。本文提出了一种基于多目标进化算法的新型高效方法。我们的新实现的更高效率使我们能够在主题集约简上重现一些显著的结果,并进行新的实验来推广和改进这些结果。我们表明,我们的方法既能够再现最先进的主要结果,又允许我们分析用于评估的收集、度量和池深度的影响。最后,与以往的研究主要是理论性的不同,我们也能够讨论一些实际的选题策略,整合自动评价方法的结果。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信