Measuring Group Advantage: A Comparative Study of Fair Ranking Metrics

Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society Pub Date : 2021-07-21 DOI:10.1145/3461702.3462588

C. Kuhlman, Walter Gerych, Elke A. Rundensteiner

{"title":"Measuring Group Advantage: A Comparative Study of Fair Ranking Metrics","authors":"C. Kuhlman, Walter Gerych, Elke A. Rundensteiner","doi":"10.1145/3461702.3462588","DOIUrl":null,"url":null,"abstract":"Ranking evaluation metrics play an important role in information retrieval, providing optimization objectives during development and means of assessment of deployed performance. Recently, fairness of rankings has been recognized as crucial, especially as automated systems are increasingly used for high impact decisions. While numerous fairness metrics have been proposed, a comparative analysis to understand their interrelationships is lacking. Even for fundamental statistical parity metrics which measure group advantage, it remains unclear whether metrics measure the same phenomena, or when one metric may produce different results than another. To address these open questions, we formulate a conceptual framework for analytical comparison of metrics. We prove that under reasonable assumptions, popular metrics in the literature exhibit the same behavior and that optimizing for one optimizes for all. However, our analysis also shows that the metrics vary in the degree of unfairness measured, in particular when one group has a strong majority. Based on this analysis, we design a practical statistical test to identify whether observed data is likely to exhibit predictable group bias. We provide a set of recommendations for practitioners to guide the choice of an appropriate fairness metric.","PeriodicalId":197336,"journal":{"name":"Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society","volume":"48 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"16","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3461702.3462588","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 16

Abstract

Ranking evaluation metrics play an important role in information retrieval, providing optimization objectives during development and means of assessment of deployed performance. Recently, fairness of rankings has been recognized as crucial, especially as automated systems are increasingly used for high impact decisions. While numerous fairness metrics have been proposed, a comparative analysis to understand their interrelationships is lacking. Even for fundamental statistical parity metrics which measure group advantage, it remains unclear whether metrics measure the same phenomena, or when one metric may produce different results than another. To address these open questions, we formulate a conceptual framework for analytical comparison of metrics. We prove that under reasonable assumptions, popular metrics in the literature exhibit the same behavior and that optimizing for one optimizes for all. However, our analysis also shows that the metrics vary in the degree of unfairness measured, in particular when one group has a strong majority. Based on this analysis, we design a practical statistical test to identify whether observed data is likely to exhibit predictable group bias. We provide a set of recommendations for practitioners to guide the choice of an appropriate fairness metric.

查看原文本刊更多论文

衡量群体优势:公平排名指标的比较研究

排名评估指标在信息检索中发挥重要作用，提供开发过程中的优化目标和评估部署性能的手段。最近，排名的公平性已经被认为是至关重要的，特别是随着自动化系统越来越多地用于高影响力的决策。虽然提出了许多公平指标，但缺乏对其相互关系的比较分析。即使是衡量群体优势的基本统计平价指标，也不清楚这些指标是否衡量了相同的现象，或者一个指标何时可能产生不同于另一个指标的结果。为了解决这些悬而未决的问题，我们为度量的分析比较制定了一个概念框架。我们证明，在合理的假设下，文献中的流行指标表现出相同的行为，并且优化一个优化所有。然而，我们的分析还表明，衡量的不公平程度各不相同，特别是当一个群体占绝对多数时。基于这一分析，我们设计了一个实用的统计检验，以确定观察到的数据是否可能表现出可预测的群体偏差。我们为从业者提供了一组建议，以指导他们选择适当的公平度量标准。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society

自引率

0.00%

发文量