Prune bias from the root: Bias removal and fairness estimation by pruning sensitive attributes in pre-trained DNN models

IF 4.3 2区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Information and Software Technology Pub Date : 2025-10-08 DOI:10.1016/j.infsof.2025.107906

Qiaolin Qin, Ettore Merlo

{"title":"Prune bias from the root: Bias removal and fairness estimation by pruning sensitive attributes in pre-trained DNN models","authors":"Qiaolin Qin, Ettore Merlo","doi":"10.1016/j.infsof.2025.107906","DOIUrl":null,"url":null,"abstract":"<div><h3>Context:</h3><div>Deep learning models (DNNs) are widely used in high-stakes decision-making domains, but they often inherit and amplify biases present in training data, leading to unfair predictions. Given this context, fairness estimation metrics and bias removal methods are required to select and enhance fair models. However, we found that existing metrics lack robustness in estimating multi-attribute group fairness. Further, existing post-processing bias removal methods often focus on group fairness and fail to address individual fairness or optimize along multiple sensitive attributes.</div></div><div><h3>Objective:</h3><div>In this study, we explore the effectiveness of attribute pruning (i.e., zeroing out sensitive attribute weights in a pre-trained DNN’s input layer) in both bias removal and multi-attribute group fairness estimation.</div></div><div><h3>Methods:</h3><div>To study attribute pruning’s impact on bias removal, we conducted experiments on 32 models and 4 widely used datasets, and compared its effect in single-attribute group bias removal and accuracy preservation with 3 baseline post-processing methods. We then leveraged 3 datasets with multiple sensitive attributes to demonstrate how to use attribute pruning for multi-attribute group fairness estimation.</div></div><div><h3>Results:</h3><div>Single-attribute pruning can better preserve model accuracy than conventional post-processing methods in 23 out of 32 cases, and enforces individual fairness by design. However, since individual fairness and group fairness are fundamentally different objectives, attribute pruning’s effect on group fairness metrics is often inconsistent. We also extend our approach to a multi-attribute setting, demonstrating its potential for improving individual fairness jointly across sensitive attributes and for enabling multi-attribute fairness-aware model selection.</div></div><div><h3>Conclusion:</h3><div>Attribute pruning is a practical post-processing approach for enforcing individual fairness, with limited and data-dependent impact on group fairness. These limitations reflect the inherent trade-off between individual and group fairness objectives. In addition, attribute pruning provides a useful mechanism for bias estimation, particularly in multi-attribute contexts. We advocate for its adoption as a comparison baseline in fairness-aware AI development and encourage further exploration.</div></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"189 ","pages":"Article 107906"},"PeriodicalIF":4.3000,"publicationDate":"2025-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information and Software Technology","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0950584925002459","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Context:

Deep learning models (DNNs) are widely used in high-stakes decision-making domains, but they often inherit and amplify biases present in training data, leading to unfair predictions. Given this context, fairness estimation metrics and bias removal methods are required to select and enhance fair models. However, we found that existing metrics lack robustness in estimating multi-attribute group fairness. Further, existing post-processing bias removal methods often focus on group fairness and fail to address individual fairness or optimize along multiple sensitive attributes.

Objective:

In this study, we explore the effectiveness of attribute pruning (i.e., zeroing out sensitive attribute weights in a pre-trained DNN’s input layer) in both bias removal and multi-attribute group fairness estimation.

Methods:

To study attribute pruning’s impact on bias removal, we conducted experiments on 32 models and 4 widely used datasets, and compared its effect in single-attribute group bias removal and accuracy preservation with 3 baseline post-processing methods. We then leveraged 3 datasets with multiple sensitive attributes to demonstrate how to use attribute pruning for multi-attribute group fairness estimation.

Results:

Single-attribute pruning can better preserve model accuracy than conventional post-processing methods in 23 out of 32 cases, and enforces individual fairness by design. However, since individual fairness and group fairness are fundamentally different objectives, attribute pruning’s effect on group fairness metrics is often inconsistent. We also extend our approach to a multi-attribute setting, demonstrating its potential for improving individual fairness jointly across sensitive attributes and for enabling multi-attribute fairness-aware model selection.

Conclusion:

Attribute pruning is a practical post-processing approach for enforcing individual fairness, with limited and data-dependent impact on group fairness. These limitations reflect the inherent trade-off between individual and group fairness objectives. In addition, attribute pruning provides a useful mechanism for bias estimation, particularly in multi-attribute contexts. We advocate for its adoption as a comparison baseline in fairness-aware AI development and encourage further exploration.

查看原文本刊更多论文

从根修剪偏倚：在预训练的DNN模型中，通过修剪敏感属性来去除偏倚和公平性估计

背景：深度学习模型（dnn）广泛用于高风险决策领域，但它们经常继承和放大训练数据中存在的偏见，导致不公平的预测。在这种背景下，需要公平估计指标和偏见消除方法来选择和增强公平模型。然而，我们发现现有的度量在估计多属性群公平性方面缺乏鲁棒性。此外，现有的后处理偏见去除方法往往侧重于群体公平性，而未能解决个体公平性或沿着多个敏感属性进行优化。目的：在本研究中，我们探讨了属性修剪（即在预训练的DNN输入层中剔除敏感属性权重）在偏见去除和多属性组公平性估计中的有效性。方法：为了研究属性修剪对偏见去除的影响，我们在32个模型和4个广泛使用的数据集上进行了实验，并比较了3种基线后处理方法在单属性组偏见去除和准确性保持方面的效果。然后，我们利用3个具有多个敏感属性的数据集来演示如何使用属性修剪进行多属性组公平性估计。结果：在32个案例中，单属性剪枝比常规后处理方法在23个案例中更好地保持了模型的准确性，并在设计上增强了个体公平性。然而，由于个体公平和群体公平是根本不同的目标，属性修剪对群体公平指标的影响往往不一致。我们还将我们的方法扩展到多属性设置，展示了它在跨敏感属性共同提高个人公平性和支持多属性公平性感知模型选择方面的潜力。结论：属性修剪是一种实用的增强个人公平的后处理方法，对群体公平的影响有限且依赖于数据。这些限制反映了个人和群体公平目标之间的内在权衡。此外，属性修剪为偏差估计提供了一种有用的机制，特别是在多属性上下文中。我们主张将其作为具有公平性的人工智能开发的比较基准，并鼓励进一步探索。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Information and Software Technology 工程技术-计算机：软件工程

CiteScore

9.10

自引率

7.70%

发文量

164

审稿时长

9.6 weeks

期刊介绍： Information and Software Technology is the international archival journal focusing on research and experience that contributes to the improvement of software development practices. The journal''s scope includes methods and techniques to better engineer software and manage its development. Articles submitted for review should have a clear component of software engineering or address ways to improve the engineering and management of software development. Areas covered by the journal include: • Software management, quality and metrics, • Software processes, • Software architecture, modelling, specification, design and programming • Functional and non-functional software requirements • Software testing and verification & validation • Empirical studies of all aspects of engineering and managing software development Short Communications is a new section dedicated to short papers addressing new ideas, controversial opinions, "Negative" results and much more. Read the Guide for authors for more information. The journal encourages and welcomes submissions of systematic literature studies (reviews and maps) within the scope of the journal. Information and Software Technology is the premiere outlet for systematic literature studies in software engineering.