繁殖。概括。扩展。无关联判断的信息检索评价研究

Kevin Roitero, Marco Passon, G. Serra, Stefano Mizzaro
{"title":"繁殖。概括。扩展。无关联判断的信息检索评价研究","authors":"Kevin Roitero, Marco Passon, G. Serra, Stefano Mizzaro","doi":"10.1145/3241064","DOIUrl":null,"url":null,"abstract":"The evaluation of retrieval effectiveness by means of test collections is a commonly used methodology in the information retrieval field. Some researchers have addressed the quite fascinating research question of whether it is possible to evaluate effectiveness completely automatically, without human relevance assessments. Since human relevance assessment is one of the main costs of building a test collection, both in human time and money resources, this rather ambitious goal would have a practical impact. In this article, we reproduce the main results on evaluating information retrieval systems without relevance judgments; furthermore, we generalize such previous work to analyze the effect of test collections, evaluation metrics, and pool depth. We also expand the idea to semi-automatic evaluation and estimation of topic difficulty. Our results show that (i) previous work is overall reproducible, although some specific results are not; (ii) collection, metric, and pool depth impact the automatic evaluation of systems, which is anyway accurate in several cases; (iii) semi-automatic evaluation is an effective methodology; and (iv) automatic evaluation can (to some extent) be used to predict topic difficulty.","PeriodicalId":15582,"journal":{"name":"Journal of Data and Information Quality (JDIQ)","volume":"69 1","pages":"1 - 32"},"PeriodicalIF":0.0000,"publicationDate":"2018-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Reproduce. Generalize. Extend. On Information Retrieval Evaluation without Relevance Judgments\",\"authors\":\"Kevin Roitero, Marco Passon, G. Serra, Stefano Mizzaro\",\"doi\":\"10.1145/3241064\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The evaluation of retrieval effectiveness by means of test collections is a commonly used methodology in the information retrieval field. Some researchers have addressed the quite fascinating research question of whether it is possible to evaluate effectiveness completely automatically, without human relevance assessments. Since human relevance assessment is one of the main costs of building a test collection, both in human time and money resources, this rather ambitious goal would have a practical impact. In this article, we reproduce the main results on evaluating information retrieval systems without relevance judgments; furthermore, we generalize such previous work to analyze the effect of test collections, evaluation metrics, and pool depth. We also expand the idea to semi-automatic evaluation and estimation of topic difficulty. Our results show that (i) previous work is overall reproducible, although some specific results are not; (ii) collection, metric, and pool depth impact the automatic evaluation of systems, which is anyway accurate in several cases; (iii) semi-automatic evaluation is an effective methodology; and (iv) automatic evaluation can (to some extent) be used to predict topic difficulty.\",\"PeriodicalId\":15582,\"journal\":{\"name\":\"Journal of Data and Information Quality (JDIQ)\",\"volume\":\"69 1\",\"pages\":\"1 - 32\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-09-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Data and Information Quality (JDIQ)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3241064\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Data and Information Quality (JDIQ)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3241064","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

摘要

利用测试集合评价检索效果是信息检索领域常用的一种方法。一些研究人员已经解决了一个非常有趣的研究问题,即是否有可能完全自动地评估有效性,而不需要人工相关性评估。由于人类相关性评估是构建测试集合的主要成本之一,无论是在人力时间还是金钱资源方面,这个相当雄心勃勃的目标将产生实际影响。在本文中,我们重现了评估信息检索系统的主要结果,没有相关性判断;此外,我们概括了以前的工作来分析测试集合、评估度量和池深度的影响。我们还将这个想法扩展到半自动评估和估计主题难度。我们的结果表明:(i)以前的工作总体上是可重复的,尽管一些特定的结果不是;(ii)收集、度量和水池深度会影响系统的自动评估,而在某些情况下,自动评估是准确的;(三)半自动评估是一种有效的方法;(4)自动评价可以(在一定程度上)用于预测题目难度。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Reproduce. Generalize. Extend. On Information Retrieval Evaluation without Relevance Judgments
The evaluation of retrieval effectiveness by means of test collections is a commonly used methodology in the information retrieval field. Some researchers have addressed the quite fascinating research question of whether it is possible to evaluate effectiveness completely automatically, without human relevance assessments. Since human relevance assessment is one of the main costs of building a test collection, both in human time and money resources, this rather ambitious goal would have a practical impact. In this article, we reproduce the main results on evaluating information retrieval systems without relevance judgments; furthermore, we generalize such previous work to analyze the effect of test collections, evaluation metrics, and pool depth. We also expand the idea to semi-automatic evaluation and estimation of topic difficulty. Our results show that (i) previous work is overall reproducible, although some specific results are not; (ii) collection, metric, and pool depth impact the automatic evaluation of systems, which is anyway accurate in several cases; (iii) semi-automatic evaluation is an effective methodology; and (iv) automatic evaluation can (to some extent) be used to predict topic difficulty.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信