Farzaneh Dehghani , Pedro Paiva , Nikita Malik , Joanna Lin , Sayeh Bayat , Mariana Bento
{"title":"医疗保健机器学习中的准确性和公平性权衡:减轻偏倚策略的定量评估","authors":"Farzaneh Dehghani , Pedro Paiva , Nikita Malik , Joanna Lin , Sayeh Bayat , Mariana Bento","doi":"10.1016/j.infsof.2025.107896","DOIUrl":null,"url":null,"abstract":"<div><h3>Context:</h3><div>Although machine learning (ML) has significant potential to improve healthcare decision-making, embedded biases in algorithms and datasets risk exacerbating health disparities across demographic groups. To address this challenge, it is essential to rigorously evaluate bias mitigation strategies to ensure fairness and reliability across patient populations.</div></div><div><h3>Objective:</h3><div>The aim of this research is to propose a comprehensive evaluation framework that systematically assesses a wide range of bias mitigation techniques at pre-processing, in-processing, and post-processing stages, using both single- and multi-stage intervention approaches.</div></div><div><h3>Methods:</h3><div>This study evaluates bias mitigation strategies across three clinical prediction tasks: breast cancer diagnosis, stroke prediction, and Alzheimer’s disease detection. Our evaluation employs group- and individual-level fairness metrics, contextualized for specific sensitive attributes relevant to each dataset. Beyond fairness-accuracy trade-offs, we demonstrate how metric selection must align with clinical goals (e.g., parity metrics for equitable access, confusion-matrix metrics for diagnostics).</div></div><div><h3>Results:</h3><div>Our results reinforce that no single classifier or mitigation strategy is universally optimal, underscoring the value of our proposed framework for evaluating fairness and accuracy throughout the bias mitigation process. According to the results, Adversarial Debiasing improved fairness by 95% in breast cancer diagnosis without compromising accuracy. Reweighing was most effective in stroke prediction, boosting fairness by 41%, and Reject Option Classification yielded nearly 50% fairness improvement in Alzheimer’s detection. Multi-stage bias mitigation did not consistently lead to better outcomes, and in many cases, fairness gains came at the expense of accuracy.</div></div><div><h3>Conclusion:</h3><div>These findings provide practical guidance for selecting fairness-aware machine learning strategies in healthcare, aiding both model development and benchmarking across diverse clinical applications.</div></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"188 ","pages":"Article 107896"},"PeriodicalIF":4.3000,"publicationDate":"2025-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Accuracy-fairness trade-off in ML for healthcare: A quantitative evaluation of bias mitigation strategies\",\"authors\":\"Farzaneh Dehghani , Pedro Paiva , Nikita Malik , Joanna Lin , Sayeh Bayat , Mariana Bento\",\"doi\":\"10.1016/j.infsof.2025.107896\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><h3>Context:</h3><div>Although machine learning (ML) has significant potential to improve healthcare decision-making, embedded biases in algorithms and datasets risk exacerbating health disparities across demographic groups. To address this challenge, it is essential to rigorously evaluate bias mitigation strategies to ensure fairness and reliability across patient populations.</div></div><div><h3>Objective:</h3><div>The aim of this research is to propose a comprehensive evaluation framework that systematically assesses a wide range of bias mitigation techniques at pre-processing, in-processing, and post-processing stages, using both single- and multi-stage intervention approaches.</div></div><div><h3>Methods:</h3><div>This study evaluates bias mitigation strategies across three clinical prediction tasks: breast cancer diagnosis, stroke prediction, and Alzheimer’s disease detection. Our evaluation employs group- and individual-level fairness metrics, contextualized for specific sensitive attributes relevant to each dataset. Beyond fairness-accuracy trade-offs, we demonstrate how metric selection must align with clinical goals (e.g., parity metrics for equitable access, confusion-matrix metrics for diagnostics).</div></div><div><h3>Results:</h3><div>Our results reinforce that no single classifier or mitigation strategy is universally optimal, underscoring the value of our proposed framework for evaluating fairness and accuracy throughout the bias mitigation process. According to the results, Adversarial Debiasing improved fairness by 95% in breast cancer diagnosis without compromising accuracy. Reweighing was most effective in stroke prediction, boosting fairness by 41%, and Reject Option Classification yielded nearly 50% fairness improvement in Alzheimer’s detection. Multi-stage bias mitigation did not consistently lead to better outcomes, and in many cases, fairness gains came at the expense of accuracy.</div></div><div><h3>Conclusion:</h3><div>These findings provide practical guidance for selecting fairness-aware machine learning strategies in healthcare, aiding both model development and benchmarking across diverse clinical applications.</div></div>\",\"PeriodicalId\":54983,\"journal\":{\"name\":\"Information and Software Technology\",\"volume\":\"188 \",\"pages\":\"Article 107896\"},\"PeriodicalIF\":4.3000,\"publicationDate\":\"2025-09-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Information and Software Technology\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0950584925002356\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information and Software Technology","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0950584925002356","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
Accuracy-fairness trade-off in ML for healthcare: A quantitative evaluation of bias mitigation strategies
Context:
Although machine learning (ML) has significant potential to improve healthcare decision-making, embedded biases in algorithms and datasets risk exacerbating health disparities across demographic groups. To address this challenge, it is essential to rigorously evaluate bias mitigation strategies to ensure fairness and reliability across patient populations.
Objective:
The aim of this research is to propose a comprehensive evaluation framework that systematically assesses a wide range of bias mitigation techniques at pre-processing, in-processing, and post-processing stages, using both single- and multi-stage intervention approaches.
Methods:
This study evaluates bias mitigation strategies across three clinical prediction tasks: breast cancer diagnosis, stroke prediction, and Alzheimer’s disease detection. Our evaluation employs group- and individual-level fairness metrics, contextualized for specific sensitive attributes relevant to each dataset. Beyond fairness-accuracy trade-offs, we demonstrate how metric selection must align with clinical goals (e.g., parity metrics for equitable access, confusion-matrix metrics for diagnostics).
Results:
Our results reinforce that no single classifier or mitigation strategy is universally optimal, underscoring the value of our proposed framework for evaluating fairness and accuracy throughout the bias mitigation process. According to the results, Adversarial Debiasing improved fairness by 95% in breast cancer diagnosis without compromising accuracy. Reweighing was most effective in stroke prediction, boosting fairness by 41%, and Reject Option Classification yielded nearly 50% fairness improvement in Alzheimer’s detection. Multi-stage bias mitigation did not consistently lead to better outcomes, and in many cases, fairness gains came at the expense of accuracy.
Conclusion:
These findings provide practical guidance for selecting fairness-aware machine learning strategies in healthcare, aiding both model development and benchmarking across diverse clinical applications.
期刊介绍:
Information and Software Technology is the international archival journal focusing on research and experience that contributes to the improvement of software development practices. The journal''s scope includes methods and techniques to better engineer software and manage its development. Articles submitted for review should have a clear component of software engineering or address ways to improve the engineering and management of software development. Areas covered by the journal include:
• Software management, quality and metrics,
• Software processes,
• Software architecture, modelling, specification, design and programming
• Functional and non-functional software requirements
• Software testing and verification & validation
• Empirical studies of all aspects of engineering and managing software development
Short Communications is a new section dedicated to short papers addressing new ideas, controversial opinions, "Negative" results and much more. Read the Guide for authors for more information.
The journal encourages and welcomes submissions of systematic literature studies (reviews and maps) within the scope of the journal. Information and Software Technology is the premiere outlet for systematic literature studies in software engineering.