Offline Evaluation without Gain

C. Clarke, Alexandra Vtyurina, Mark D. Smucker
{"title":"Offline Evaluation without Gain","authors":"C. Clarke, Alexandra Vtyurina, Mark D. Smucker","doi":"10.1145/3409256.3409816","DOIUrl":null,"url":null,"abstract":"We propose a simple and flexible framework for offline evaluation based on a weak ordering of results (which we call \"partial preferences\") that define a set of ideal rankings for a query. These partial preferences can be derived from from side-by-side preference judgments, from graded judgments, from a combination of the two, or through other methods. We then measure the performance of a ranker by computing the maximum similarity between the actual ranking it generates for the query and elements of this ideal result set. We call this measure the \"compatibility\" of the actual ranking with the ideal result set. We demonstrate that compatibility can replace and extend current offline evaluation measures that depend on fixed relevance grades that must be mapped to gain values, such as NDCG. We examine a specific instance of compatibility based on rank biased overlap (RBO). We experimentally validate compatibility over multiple collections with different types of partial preferences, including very fine-grained preferences and partial preferences focused on the top ranks. As well as providing additional insights and flexibility, compatibility avoids shortcomings of both full preference judgments and traditional graded judgments.","PeriodicalId":430907,"journal":{"name":"Proceedings of the 2020 ACM SIGIR on International Conference on Theory of Information Retrieval","volume":"14 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"18","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2020 ACM SIGIR on International Conference on Theory of Information Retrieval","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3409256.3409816","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 18

Abstract

We propose a simple and flexible framework for offline evaluation based on a weak ordering of results (which we call "partial preferences") that define a set of ideal rankings for a query. These partial preferences can be derived from from side-by-side preference judgments, from graded judgments, from a combination of the two, or through other methods. We then measure the performance of a ranker by computing the maximum similarity between the actual ranking it generates for the query and elements of this ideal result set. We call this measure the "compatibility" of the actual ranking with the ideal result set. We demonstrate that compatibility can replace and extend current offline evaluation measures that depend on fixed relevance grades that must be mapped to gain values, such as NDCG. We examine a specific instance of compatibility based on rank biased overlap (RBO). We experimentally validate compatibility over multiple collections with different types of partial preferences, including very fine-grained preferences and partial preferences focused on the top ranks. As well as providing additional insights and flexibility, compatibility avoids shortcomings of both full preference judgments and traditional graded judgments.
无增益离线评估
我们提出了一个简单而灵活的离线评估框架,该框架基于结果的弱排序(我们称之为“部分偏好”),它为查询定义了一组理想的排名。这些部分偏好可以从并排偏好判断、分级判断、两者结合或通过其他方法获得。然后,我们通过计算它为查询生成的实际排名与此理想结果集的元素之间的最大相似度来衡量排名器的性能。我们称这种度量为实际排名与理想结果集的“兼容性”。我们证明了兼容性可以取代和扩展当前的离线评估方法,这些方法依赖于必须映射到获得值的固定相关等级,例如NDCG。我们研究了一个基于秩偏重叠(RBO)的兼容性的具体实例。我们通过实验验证了具有不同类型部分首选项的多个集合的兼容性,包括非常细粒度的首选项和专注于顶级的部分首选项。除了提供额外的洞察力和灵活性外,兼容性还避免了完全偏好判断和传统分级判断的缺点。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信