Measuring Model Fairness under Noisy Covariates: A Theoretical Perspective

Flavien Prost, Pranjal Awasthi, Nicholas Blumm, A. Kumthekar, Trevor Potter, Li Wei, Xuezhi Wang, Ed H. Chi, Jilin Chen, Alex Beutel
{"title":"Measuring Model Fairness under Noisy Covariates: A Theoretical Perspective","authors":"Flavien Prost, Pranjal Awasthi, Nicholas Blumm, A. Kumthekar, Trevor Potter, Li Wei, Xuezhi Wang, Ed H. Chi, Jilin Chen, Alex Beutel","doi":"10.1145/3461702.3462603","DOIUrl":null,"url":null,"abstract":"In this work we study the problem of measuring the fairness of a machine learning model under noisy information. Focusing on group fairness metrics, we investigate the particular but common situation when the evaluation requires controlling for the confounding effect of covariate variables. In a practical setting, we might not be able to jointly observe the covariate and group information, and a standard workaround is to then use proxies for one or more of these variables. Prior works have demonstrated the challenges with using a proxy for sensitive attributes, and strong independence assumptions are needed to provide guarantees on the accuracy of the noisy estimates. In contrast, in this work we study using a proxy for the covariate variable and present a theoretical analysis that aims to characterize weaker conditions under which accurate fairness evaluation is possible. Furthermore, our theory identifies potential sources of errors and decouples them into two interpretable parts y and E. The first part y depends solely on the performance of the proxy such as precision and recall, whereas the second part E captures correlations between all the variables of interest. We show that in many scenarios the error in the estimates is dominated by y via a linear dependence, whereas the dependence on the correlations E only constitutes a lower order term. As a result we expand the understanding of scenarios where measuring model fairness via proxies can be an effective approach. Finally, we compare, via simulations, the theoretical upper-bounds to the distribution of simulated estimation errors and show that assuming some structure on the data, even weak, is key to significantly improve both theoretical guarantees and empirical results.","PeriodicalId":197336,"journal":{"name":"Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society","volume":"28 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3461702.3462603","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 9

Abstract

In this work we study the problem of measuring the fairness of a machine learning model under noisy information. Focusing on group fairness metrics, we investigate the particular but common situation when the evaluation requires controlling for the confounding effect of covariate variables. In a practical setting, we might not be able to jointly observe the covariate and group information, and a standard workaround is to then use proxies for one or more of these variables. Prior works have demonstrated the challenges with using a proxy for sensitive attributes, and strong independence assumptions are needed to provide guarantees on the accuracy of the noisy estimates. In contrast, in this work we study using a proxy for the covariate variable and present a theoretical analysis that aims to characterize weaker conditions under which accurate fairness evaluation is possible. Furthermore, our theory identifies potential sources of errors and decouples them into two interpretable parts y and E. The first part y depends solely on the performance of the proxy such as precision and recall, whereas the second part E captures correlations between all the variables of interest. We show that in many scenarios the error in the estimates is dominated by y via a linear dependence, whereas the dependence on the correlations E only constitutes a lower order term. As a result we expand the understanding of scenarios where measuring model fairness via proxies can be an effective approach. Finally, we compare, via simulations, the theoretical upper-bounds to the distribution of simulated estimation errors and show that assuming some structure on the data, even weak, is key to significantly improve both theoretical guarantees and empirical results.
噪声协变量下模型公平性测量的理论视角
在这项工作中,我们研究了在噪声信息下测量机器学习模型公平性的问题。关注群体公平指标,我们研究了评估需要控制协变量混杂效应的特殊但常见的情况。在实际设置中,我们可能无法联合观察协变量和组信息,然后标准的解决方法是为这些变量中的一个或多个使用代理。先前的工作已经证明了使用敏感属性代理的挑战,并且需要强大的独立性假设来保证噪声估计的准确性。相比之下,在这项工作中,我们使用协变量的代理进行研究,并提出了一个理论分析,旨在描述可能进行准确公平评估的较弱条件。此外,我们的理论确定了潜在的错误来源,并将它们解耦为两个可解释的部分y和E。第一部分y仅取决于代理的性能,如精度和召回率,而第二部分E捕获所有感兴趣的变量之间的相关性。我们表明,在许多情况下,估计误差通过线性依赖由y主导,而对相关性E的依赖仅构成低阶项。因此,我们扩展了对通过代理测量模型公平性可能是一种有效方法的场景的理解。最后,我们通过模拟比较了理论上界与模拟估计误差的分布,并表明在数据上假设一些结构,即使是弱结构,是显著提高理论保证和经验结果的关键。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信