Reproduce and Improve

Journal of Data and Information Quality (JDIQ) Pub Date : 2018-09-29 DOI:10.1145/3239573

Kevin Roitero, Michael Soprano, Andrea Brunello, Stefano Mizzaro

{"title":"Reproduce and Improve","authors":"Kevin Roitero, Michael Soprano, Andrea Brunello, Stefano Mizzaro","doi":"10.1145/3239573","DOIUrl":null,"url":null,"abstract":"Effectiveness evaluation of information retrieval systems by means of a test collection is a widely used methodology. However, it is rather expensive in terms of resources, time, and money; therefore, many researchers have proposed methods for a cheaper evaluation. One particular approach, on which we focus in this article, is to use fewer topics: in TREC-like initiatives, usually system effectiveness is evaluated as the average effectiveness on a set of n topics (usually, n=50, but more than 1,000 have been also adopted); instead of using the full set, it has been proposed to find the best subsets of a few good topics that evaluate the systems in the most similar way to the full set. The computational complexity of the task has so far limited the analysis that has been performed. We develop a novel and efficient approach based on a multi-objective evolutionary algorithm. The higher efficiency of our new implementation allows us to reproduce some notable results on topic set reduction, as well as perform new experiments to generalize and improve such results. We show that our approach is able to both reproduce the main state-of-the-art results and to allow us to analyze the effect of the collection, metric, and pool depth used for the evaluation. Finally, differently from previous studies, which have been mainly theoretical, we are also able to discuss some practical topic selection strategies, integrating results of automatic evaluation approaches.","PeriodicalId":15582,"journal":{"name":"Journal of Data and Information Quality (JDIQ)","volume":"27 1","pages":"1 - 21"},"PeriodicalIF":0.0000,"publicationDate":"2018-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Data and Information Quality (JDIQ)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3239573","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 7

Abstract

Effectiveness evaluation of information retrieval systems by means of a test collection is a widely used methodology. However, it is rather expensive in terms of resources, time, and money; therefore, many researchers have proposed methods for a cheaper evaluation. One particular approach, on which we focus in this article, is to use fewer topics: in TREC-like initiatives, usually system effectiveness is evaluated as the average effectiveness on a set of n topics (usually, n=50, but more than 1,000 have been also adopted); instead of using the full set, it has been proposed to find the best subsets of a few good topics that evaluate the systems in the most similar way to the full set. The computational complexity of the task has so far limited the analysis that has been performed. We develop a novel and efficient approach based on a multi-objective evolutionary algorithm. The higher efficiency of our new implementation allows us to reproduce some notable results on topic set reduction, as well as perform new experiments to generalize and improve such results. We show that our approach is able to both reproduce the main state-of-the-art results and to allow us to analyze the effect of the collection, metric, and pool depth used for the evaluation. Finally, differently from previous studies, which have been mainly theoretical, we are also able to discuss some practical topic selection strategies, integrating results of automatic evaluation approaches.

查看原文本刊更多论文

再生产和改进

利用测试集对信息检索系统进行有效性评估是一种广泛使用的方法。然而，在资源、时间和金钱方面，这是相当昂贵的;因此，许多研究人员提出了更便宜的评估方法。我们在本文中关注的一种特殊方法是使用更少的主题:在类似trec的计划中，系统有效性通常被评估为一组n个主题的平均有效性(通常，n=50，但也采用了1000多个);与其使用完整集，还不如找到几个好主题的最佳子集，这些主题以与完整集最相似的方式评估系统。到目前为止，该任务的计算复杂性限制了已执行的分析。本文提出了一种基于多目标进化算法的新型高效方法。我们的新实现的更高效率使我们能够在主题集约简上重现一些显著的结果，并进行新的实验来推广和改进这些结果。我们表明，我们的方法既能够再现最先进的主要结果，又允许我们分析用于评估的收集、度量和池深度的影响。最后，与以往的研究主要是理论性的不同，我们也能够讨论一些实际的选题策略，整合自动评价方法的结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Data and Information Quality (JDIQ)

自引率

0.00%

发文量