A statistical method for system evaluation using incomplete judgments

J. Aslam, Virgil Pavlu, Emine Yilmaz
{"title":"A statistical method for system evaluation using incomplete judgments","authors":"J. Aslam, Virgil Pavlu, Emine Yilmaz","doi":"10.1145/1148170.1148263","DOIUrl":null,"url":null,"abstract":"We consider the problem of large-scale retrieval evaluation, and we propose a statistical method for evaluating retrieval systems using incomplete judgments. Unlike existing techniques that (1) rely on effectively complete, and thus prohibitively expensive, relevance judgment sets, (2) produce biased estimates of standard performance measures, or (3) produce estimates of non-standard measures thought to be correlated with these standard measures, our proposed statistical technique produces unbiased estimates of the standard measures themselves.Our proposed technique is based on random sampling. While our estimates are unbiased by statistical design, their variance is dependent on the sampling distribution employed; as such, we derive a sampling distribution likely to yield low variance estimates. We test our proposed technique using benchmark TREC data, demonstrating that a sampling pool derived from a set of runs can be used to efficiently and effectively evaluate those runs. We further show that these sampling pools generalize well to unseen runs. Our experiments indicate that highly accurate estimates of standard performance measures can be obtained using a number of relevance judgments as small as 4% of the typical TREC-style judgment pool.","PeriodicalId":433366,"journal":{"name":"Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2006-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"185","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/1148170.1148263","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 185

Abstract

We consider the problem of large-scale retrieval evaluation, and we propose a statistical method for evaluating retrieval systems using incomplete judgments. Unlike existing techniques that (1) rely on effectively complete, and thus prohibitively expensive, relevance judgment sets, (2) produce biased estimates of standard performance measures, or (3) produce estimates of non-standard measures thought to be correlated with these standard measures, our proposed statistical technique produces unbiased estimates of the standard measures themselves.Our proposed technique is based on random sampling. While our estimates are unbiased by statistical design, their variance is dependent on the sampling distribution employed; as such, we derive a sampling distribution likely to yield low variance estimates. We test our proposed technique using benchmark TREC data, demonstrating that a sampling pool derived from a set of runs can be used to efficiently and effectively evaluate those runs. We further show that these sampling pools generalize well to unseen runs. Our experiments indicate that highly accurate estimates of standard performance measures can be obtained using a number of relevance judgments as small as 4% of the typical TREC-style judgment pool.
使用不完全判断进行系统评价的统计方法
考虑了大规模检索评价问题,提出了一种利用不完全判断对检索系统进行评价的统计方法。不像现有的技术(1)依赖于有效完整的,因此非常昂贵的相关性判断集,(2)产生标准性能度量的有偏差估计,或(3)产生被认为与这些标准度量相关的非标准度量的估计,我们提出的统计技术产生标准度量本身的无偏估计。我们提出的技术是基于随机抽样的。虽然我们的估计是无偏的统计设计,他们的方差是依赖于抽样分布;因此,我们推导出一个可能产生低方差估计的抽样分布。我们使用基准TREC数据测试了我们提出的技术,证明了从一组运行中获得的采样池可以有效地评估这些运行。我们进一步表明,这些采样池可以很好地推广到未见过的运行。我们的实验表明,使用少量的相关性判断(仅占典型trec风格判断池的4%)就可以获得对标准性能度量的高度准确的估计。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信