有效性指标的自由度到底有多大？

IF 4.3 2区管理学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Journal of the Association for Information Science and Technology Pub Date : 2024-02-15 DOI:10.1002/asi.24874

Alistair Moffat, Joel Mackenzie

{"title":"有效性指标的自由度到底有多大？","authors":"Alistair Moffat, Joel Mackenzie","doi":"10.1002/asi.24874","DOIUrl":null,"url":null,"abstract":"It is tempting to assume that because effectiveness metrics have free choice to assign scores to search engine result pages (SERPs) there must thus be a similar degree of freedom as to the relative order that SERP pairs can be put into. In fact that second freedom is, to a considerable degree, illusory. That is because if one SERP in a pair has been given a certain score by a metric, fundamental ordering constraints in many cases then dictate that the score for the second SERP must be either not less than, or not greater than, the score assigned to the first SERP. We refer to these fixed relationships as innate pairwise SERP orderings. Our first goal in this work is to describe and defend those pairwise SERP relationship constraints, and tabulate their relative occurrence via both exhaustive and empirical experimentation. We then consider how to employ such innate pairwise relationships in IR experiments, leading to a proposal for a new measurement paradigm. Specifically, we argue that tables of results in which many different metrics are listed for champion versus challenger system comparisons should be avoided; and that instead a single metric be argued for in principled terms, with any relationships identified by that metric then reinforced via an assessment of the innate relationship as to whether other metrics are likely to yield the same system-versus-system outcome.","PeriodicalId":48810,"journal":{"name":"Journal of the Association for Information Science and Technology","volume":"75 6","pages":"686-703"},"PeriodicalIF":4.3000,"publicationDate":"2024-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/asi.24874","citationCount":"0","resultStr":"{\"title\":\"How much freedom does an effectiveness metric really have?\",\"authors\":\"Alistair Moffat, Joel Mackenzie\",\"doi\":\"10.1002/asi.24874\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"It is tempting to assume that because effectiveness metrics have free choice to assign scores to search engine result pages (SERPs) there must thus be a similar degree of freedom as to the relative order that SERP pairs can be put into. In fact that second freedom is, to a considerable degree, illusory. That is because if one SERP in a pair has been given a certain score by a metric, fundamental ordering constraints in many cases then dictate that the score for the second SERP must be either not less than, or not greater than, the score assigned to the first SERP. We refer to these fixed relationships as innate pairwise SERP orderings. Our first goal in this work is to describe and defend those pairwise SERP relationship constraints, and tabulate their relative occurrence via both exhaustive and empirical experimentation. We then consider how to employ such innate pairwise relationships in IR experiments, leading to a proposal for a new measurement paradigm. Specifically, we argue that tables of results in which many different metrics are listed for champion versus challenger system comparisons should be avoided; and that instead a single metric be argued for in principled terms, with any relationships identified by that metric then reinforced via an assessment of the innate relationship as to whether other metrics are likely to yield the same system-versus-system outcome.\",\"PeriodicalId\":48810,\"journal\":{\"name\":\"Journal of the Association for Information Science and Technology\",\"volume\":\"75 6\",\"pages\":\"686-703\"},\"PeriodicalIF\":4.3000,\"publicationDate\":\"2024-02-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://onlinelibrary.wiley.com/doi/epdf/10.1002/asi.24874\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of the Association for Information Science and Technology\",\"FirstCategoryId\":\"91\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1002/asi.24874\",\"RegionNum\":2,\"RegionCategory\":\"管理学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of the Association for Information Science and Technology","FirstCategoryId":"91","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/asi.24874","RegionNum":2,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

摘要

人们很容易假定，由于效果度量可以自由选择给搜索引擎结果页面（SERP）打分，因此在 SERP 对的相对排序方面也必然有类似程度的自由。事实上，第二种自由在相当程度上是虚幻的。这是因为，如果一对 SERP 中的一个 SERP 已被指标赋予了一定的分数，那么在许多情况下，基本的排序约束就会规定第二个 SERP 的分数必须不小于或不大于第一个 SERP 的分数。我们将这些固定关系称为天生的成对 SERP 排序。我们在这项工作中的首要目标是描述和维护这些成对 SERP 关系约束，并通过详尽的实验和经验实验列出它们的相对发生率。然后，我们将考虑如何在 IR 实验中使用这种先天的成对关系，并由此提出一种新的测量范式。具体来说，我们认为应该避免在结果表中列出许多不同的指标，用于冠军与挑战者系统的比较；而应该用原则性术语来论证单一指标，然后通过对先天关系的评估来加强该指标所确定的任何关系，以确定其他指标是否有可能产生相同的系统对系统结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

How much freedom does an effectiveness metric really have?

查看原文本刊更多论文

How much freedom does an effectiveness metric really have?

It is tempting to assume that because effectiveness metrics have free choice to assign scores to search engine result pages (SERPs) there must thus be a similar degree of freedom as to the relative order that SERP pairs can be put into. In fact that second freedom is, to a considerable degree, illusory. That is because if one SERP in a pair has been given a certain score by a metric, fundamental ordering constraints in many cases then dictate that the score for the second SERP must be either not less than, or not greater than, the score assigned to the first SERP. We refer to these fixed relationships as innate pairwise SERP orderings. Our first goal in this work is to describe and defend those pairwise SERP relationship constraints, and tabulate their relative occurrence via both exhaustive and empirical experimentation. We then consider how to employ such innate pairwise relationships in IR experiments, leading to a proposal for a new measurement paradigm. Specifically, we argue that tables of results in which many different metrics are listed for champion versus challenger system comparisons should be avoided; and that instead a single metric be argued for in principled terms, with any relationships identified by that metric then reinforced via an assessment of the innate relationship as to whether other metrics are likely to yield the same system-versus-system outcome.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of the Association for Information Science and Technology COMPUTER SCIENCE, INFORMATION SYSTEMS-

CiteScore

8.30

自引率

8.60%

发文量

115

期刊介绍： The Journal of the Association for Information Science and Technology (JASIST) is a leading international forum for peer-reviewed research in information science. For more than half a century, JASIST has provided intellectual leadership by publishing original research that focuses on the production, discovery, recording, storage, representation, retrieval, presentation, manipulation, dissemination, use, and evaluation of information and on the tools and techniques associated with these processes. The Journal welcomes rigorous work of an empirical, experimental, ethnographic, conceptual, historical, socio-technical, policy-analytic, or critical-theoretical nature. JASIST also commissions in-depth review articles (“Advances in Information Science”) and reviews of print and other media.