FairBalance: How to Achieve Equalized Odds With Data Pre-Processing

IF 6.5 1区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

IEEE Transactions on Software Engineering Pub Date : 2024-07-22 DOI:10.1109/TSE.2024.3431445

Zhe Yu;Joymallya Chakraborty;Tim Menzies

{"title":"FairBalance: How to Achieve Equalized Odds With Data Pre-Processing","authors":"Zhe Yu;Joymallya Chakraborty;Tim Menzies","doi":"10.1109/TSE.2024.3431445","DOIUrl":null,"url":null,"abstract":"This research seeks to benefit the software engineering society by providing a simple yet effective pre-processing approach to achieve equalized odds fairness in machine learning software. Fairness issues have attracted increasing attention since machine learning software is increasingly used for high-stakes and high-risk decisions. It is the responsibility of all software developers to make their software accountable by ensuring that the machine learning software do not perform differently on different sensitive demographic groups—satisfying equalized odds. Different from prior works which either optimize for an equalized odds related metric during the learning process like a black-box, or manipulate the training data following some intuition; this work studies the root cause of the violation of equalized odds and how to tackle it. We found that equalizing the class distribution in each demographic group with sample weights is a necessary condition for achieving equalized odds without modifying the normal training process. In addition, an important partial condition for equalized odds (zero average odds difference) can be guaranteed when the class distributions are weighted to be not only equal but also balanced (1:1). Based on these analyses, we proposed FairBalance, a pre-processing algorithm which balances the class distribution in each demographic group by assigning calculated weights to the training data. On eight real-world datasets, our empirical results show that, at low computational overhead, the proposed pre-processing algorithm FairBalance can significantly improve equalized odds without much, if any damage to the utility. FairBalance also outperforms existing state-of-the-art approaches in terms of equalized odds. To facilitate reuse, reproduction, and validation, we made our scripts available at \n<uri>https://github.com/hil-se/FairBalance</uri>\n.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"50 9","pages":"2294-2312"},"PeriodicalIF":6.5000,"publicationDate":"2024-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Software Engineering","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10606107/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}

引用次数: 0

Abstract

This research seeks to benefit the software engineering society by providing a simple yet effective pre-processing approach to achieve equalized odds fairness in machine learning software. Fairness issues have attracted increasing attention since machine learning software is increasingly used for high-stakes and high-risk decisions. It is the responsibility of all software developers to make their software accountable by ensuring that the machine learning software do not perform differently on different sensitive demographic groups—satisfying equalized odds. Different from prior works which either optimize for an equalized odds related metric during the learning process like a black-box, or manipulate the training data following some intuition; this work studies the root cause of the violation of equalized odds and how to tackle it. We found that equalizing the class distribution in each demographic group with sample weights is a necessary condition for achieving equalized odds without modifying the normal training process. In addition, an important partial condition for equalized odds (zero average odds difference) can be guaranteed when the class distributions are weighted to be not only equal but also balanced (1:1). Based on these analyses, we proposed FairBalance, a pre-processing algorithm which balances the class distribution in each demographic group by assigning calculated weights to the training data. On eight real-world datasets, our empirical results show that, at low computational overhead, the proposed pre-processing algorithm FairBalance can significantly improve equalized odds without much, if any damage to the utility. FairBalance also outperforms existing state-of-the-art approaches in terms of equalized odds. To facilitate reuse, reproduction, and validation, we made our scripts available at https://github.com/hil-se/FairBalance .

查看原文本刊更多论文

公平均衡：如何通过数据预处理实现胜负均等

这项研究旨在通过提供一种简单而有效的预处理方法，在机器学习软件中实现均衡赔率公平性，从而造福软件工程社会。由于机器学习软件越来越多地用于高风险决策，公平性问题日益受到关注。所有软件开发人员都有责任确保机器学习软件不会对不同的敏感人群产生不同的表现--满足均衡赔率的要求。以往的研究要么像黑盒子一样在学习过程中优化与均衡赔率相关的指标，要么按照某种直觉处理训练数据；与此不同的是，这项工作研究的是违反均衡赔率的根本原因以及如何解决这个问题。我们发现，在不修改正常训练过程的情况下，利用样本权重均衡每个人口统计组的类别分布是实现均衡赔率的必要条件。此外，当类分布的权重不仅相等，而且均衡（1:1）时，均衡赔率的一个重要部分条件（平均赔率差为零）也能得到保证。基于这些分析，我们提出了公平平衡算法（FairBalance），这是一种预处理算法，通过为训练数据分配计算出的权重来平衡每个人口统计组的类别分布。在八个真实世界的数据集上，我们的实证结果表明，在计算开销较低的情况下，所提出的预处理算法 FairBalance 可以显著提高均衡赔率，而不会对效用造成任何损害。在均衡赔率方面，FairBalance 也优于现有的最先进方法。为了便于重复使用、复制和验证，我们在 https://github.com/hil-se/FairBalance 上提供了我们的脚本。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Software Engineering 工程技术-工程：电子与电气

CiteScore

9.70

自引率

10.80%

发文量

724

审稿时长

6 months

期刊介绍： IEEE Transactions on Software Engineering seeks contributions comprising well-defined theoretical results and empirical studies with potential impacts on software construction, analysis, or management. The scope of this Transactions extends from fundamental mechanisms to the development of principles and their application in specific environments. Specific topic areas include: a) Development and maintenance methods and models: Techniques and principles for specifying, designing, and implementing software systems, encompassing notations and process models. b) Assessment methods: Software tests, validation, reliability models, test and diagnosis procedures, software redundancy, design for error control, and measurements and evaluation of process and product aspects. c) Software project management: Productivity factors, cost models, schedule and organizational issues, and standards. d) Tools and environments: Specific tools, integrated tool environments, associated architectures, databases, and parallel and distributed processing issues. e) System issues: Hardware-software trade-offs. f) State-of-the-art surveys: Syntheses and comprehensive reviews of the historical development within specific areas of interest.