一种简单有效的分数标准化方法

Proceedings of the 2016 ACM International Conference on the Theory of Information Retrieval Pub Date : 2016-09-12 DOI:10.1145/2970398.2970399

T. Sakai

{"title":"一种简单有效的分数标准化方法","authors":"T. Sakai","doi":"10.1145/2970398.2970399","DOIUrl":null,"url":null,"abstract":"Webber, Moffat and Zobel proposed score standardization for information retrieval evaluation with multiple test collections. Given a topic-by-run raw score matrix in terms of some evaluation measure, each score can be standardised using the topic's sample mean and sample standard deviation across a set of past runs so as to quantify how different a system is from the \"average\" system in standard deviation units. Using standardised scores, researchers can compare systems across different test collections without worrying about topic hardness or normalisation. WhileWebber et al. mapped the standardised scores to the [0, 1] range using a standard normal cumulative density function, the present study demonstrates that linear transformation of the standardised scores, a method widely used in educational research, can be a simple and effective alternative. We use three TREC robust track data sets with graded relevance assessments and official runs to compare these methods by means of leave-one-out tests, discriminative power, swap rate tests, and topic set size design. In particular, we demonstrate that our method is superior to the method of Webber et al. in terms of swap rates and topic set size design: put simply, our method ensures pairwise system comparisons that are more consistent across different data sets, and is arguably more convenient for designing a new test collection from a statistical viewpoint.","PeriodicalId":443715,"journal":{"name":"Proceedings of the 2016 ACM International Conference on the Theory of Information Retrieval","volume":"118 4-5","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"17","resultStr":"{\"title\":\"A Simple and Effective Approach to Score Standardisation\",\"authors\":\"T. Sakai\",\"doi\":\"10.1145/2970398.2970399\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Webber, Moffat and Zobel proposed score standardization for information retrieval evaluation with multiple test collections. Given a topic-by-run raw score matrix in terms of some evaluation measure, each score can be standardised using the topic's sample mean and sample standard deviation across a set of past runs so as to quantify how different a system is from the \\\"average\\\" system in standard deviation units. Using standardised scores, researchers can compare systems across different test collections without worrying about topic hardness or normalisation. WhileWebber et al. mapped the standardised scores to the [0, 1] range using a standard normal cumulative density function, the present study demonstrates that linear transformation of the standardised scores, a method widely used in educational research, can be a simple and effective alternative. We use three TREC robust track data sets with graded relevance assessments and official runs to compare these methods by means of leave-one-out tests, discriminative power, swap rate tests, and topic set size design. In particular, we demonstrate that our method is superior to the method of Webber et al. in terms of swap rates and topic set size design: put simply, our method ensures pairwise system comparisons that are more consistent across different data sets, and is arguably more convenient for designing a new test collection from a statistical viewpoint.\",\"PeriodicalId\":443715,\"journal\":{\"name\":\"Proceedings of the 2016 ACM International Conference on the Theory of Information Retrieval\",\"volume\":\"118 4-5\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-09-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"17\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2016 ACM International Conference on the Theory of Information Retrieval\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2970398.2970399\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2016 ACM International Conference on the Theory of Information Retrieval","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2970398.2970399","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 17

摘要

Webber, Moffat和Zobel提出了基于多个测试集合的信息检索评价的分数标准化。根据一些评估措施，给定每个主题的原始分数矩阵，每个分数可以使用主题的样本平均值和样本标准差在过去的一组运行中进行标准化，从而量化系统与“平均”系统在标准差单位上的差异。使用标准化分数，研究人员可以在不同的测试集合中比较系统，而不必担心主题硬度或规范化。虽然webber等人使用标准正态累积密度函数将标准化分数映射到[0,1]范围，但本研究表明，标准化分数的线性变换(一种在教育研究中广泛使用的方法)可以是一种简单有效的替代方法。我们使用三个具有分级相关性评估和官方运行的TREC稳健跟踪数据集，通过留一测试、判别能力、互换率测试和主题集大小设计来比较这些方法。特别是，我们证明了我们的方法在互换率和主题集大小设计方面优于Webber等人的方法:简而言之，我们的方法确保了在不同数据集之间更加一致的成对系统比较，并且从统计的角度来看，可以说更方便设计新的测试集。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A Simple and Effective Approach to Score Standardisation

Webber, Moffat and Zobel proposed score standardization for information retrieval evaluation with multiple test collections. Given a topic-by-run raw score matrix in terms of some evaluation measure, each score can be standardised using the topic's sample mean and sample standard deviation across a set of past runs so as to quantify how different a system is from the "average" system in standard deviation units. Using standardised scores, researchers can compare systems across different test collections without worrying about topic hardness or normalisation. WhileWebber et al. mapped the standardised scores to the [0, 1] range using a standard normal cumulative density function, the present study demonstrates that linear transformation of the standardised scores, a method widely used in educational research, can be a simple and effective alternative. We use three TREC robust track data sets with graded relevance assessments and official runs to compare these methods by means of leave-one-out tests, discriminative power, swap rate tests, and topic set size design. In particular, we demonstrate that our method is superior to the method of Webber et al. in terms of swap rates and topic set size design: put simply, our method ensures pairwise system comparisons that are more consistent across different data sets, and is arguably more convenient for designing a new test collection from a statistical viewpoint.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 2016 ACM International Conference on the Theory of Information Retrieval

自引率

0.00%

发文量