Trade-offs between fairness and performance in educational AI: Analyzing post-processing bias mitigation on the OULAD

IF 4.3 2区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Information and Software Technology Pub Date : 2025-10-17 DOI:10.1016/j.infsof.2025.107933

Sachini Gunasekara, Mirka Saarela

{"title":"Trade-offs between fairness and performance in educational AI: Analyzing post-processing bias mitigation on the OULAD","authors":"Sachini Gunasekara, Mirka Saarela","doi":"10.1016/j.infsof.2025.107933","DOIUrl":null,"url":null,"abstract":"<div><h3>Context:</h3><div>AI-driven educational tools often face a trade-off between fairness and performance, particularly when addressing biases across sensitive demographic attributes. While fairness metrics have been developed to monitor and mitigate bias, optimizing all of these metrics simultaneously is mathematically infeasible, and adjustments to fairness often result in a decrease in overall system performance.</div></div><div><h3>Objective:</h3><div>This study investigates the trade-off between predictive performance and fairness in educational AI systems, focusing on <em>gender</em> and <em>disability</em> as sensitive attributes. We evaluate whether post-processing fairness interventions can mitigate group-level disparities while preserving model usability.</div></div><div><h3>Method:</h3><div>Using the Open University Learning Analytics Dataset, we trained four machine learning models to predict student outcomes. We applied the equalized odds post-processing technique to mitigate bias and assessed model performance with accuracy, F1-score, and AUC, alongside fairness metrics including statistical parity difference (SPD) and equal opportunity difference (EOD). Statistical significance of changes was tested using the Wilcoxon signed-rank test.</div></div><div><h3>Results:</h3><div>All models achieved strong baseline predictive performance, with RF performing best overall. However, systematic disparities were evident, particularly for students with disabilities, showing that high accuracy does not necessarily ensure equitable outcomes. Post-processing reduced group-level disparities substantially, with SPD and EOD values approaching zero, though accuracy and F1-scores decreased slightly but significantly. RF and ANN were more resilient to fairness adjustments.</div></div><div><h3>Conclusion:</h3><div>This study highlights the importance of fairness-aware machine learning, such as post-processing interventions, and suggests that appropriate mitigation methods should be used to ensure benefits are distributed equitably across diverse learners, without favoring any particular fairness metric.</div></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"189 ","pages":"Article 107933"},"PeriodicalIF":4.3000,"publicationDate":"2025-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information and Software Technology","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0950584925002721","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Context:

AI-driven educational tools often face a trade-off between fairness and performance, particularly when addressing biases across sensitive demographic attributes. While fairness metrics have been developed to monitor and mitigate bias, optimizing all of these metrics simultaneously is mathematically infeasible, and adjustments to fairness often result in a decrease in overall system performance.

Objective:

This study investigates the trade-off between predictive performance and fairness in educational AI systems, focusing on gender and disability as sensitive attributes. We evaluate whether post-processing fairness interventions can mitigate group-level disparities while preserving model usability.

Method:

Using the Open University Learning Analytics Dataset, we trained four machine learning models to predict student outcomes. We applied the equalized odds post-processing technique to mitigate bias and assessed model performance with accuracy, F1-score, and AUC, alongside fairness metrics including statistical parity difference (SPD) and equal opportunity difference (EOD). Statistical significance of changes was tested using the Wilcoxon signed-rank test.

Results:

All models achieved strong baseline predictive performance, with RF performing best overall. However, systematic disparities were evident, particularly for students with disabilities, showing that high accuracy does not necessarily ensure equitable outcomes. Post-processing reduced group-level disparities substantially, with SPD and EOD values approaching zero, though accuracy and F1-scores decreased slightly but significantly. RF and ANN were more resilient to fairness adjustments.

Conclusion:

This study highlights the importance of fairness-aware machine learning, such as post-processing interventions, and suggests that appropriate mitigation methods should be used to ensure benefits are distributed equitably across diverse learners, without favoring any particular fairness metric.

查看原文本刊更多论文

教育人工智能中公平性与绩效之间的权衡：分析基于OULAD的后处理偏见缓解

背景：人工智能驱动的教育工具经常面临公平性和绩效之间的权衡，特别是在解决敏感人口统计属性的偏见时。虽然已经开发了公平性指标来监控和减轻偏见，但同时优化所有这些指标在数学上是不可行的，并且对公平性的调整通常会导致整体系统性能的下降。目的：本研究探讨了教育人工智能系统中预测性能和公平性之间的权衡，重点关注性别和残疾作为敏感属性。我们评估了后处理公平干预是否可以在保持模型可用性的同时减轻群体水平的差异。方法：使用开放大学学习分析数据集，我们训练了四个机器学习模型来预测学生的成绩。我们应用等赔率后处理技术来减轻偏差，并通过准确性、f1得分和AUC以及包括统计平价差异（SPD）和平等机会差异（EOD）在内的公平性指标来评估模型的性能。采用Wilcoxon符号秩检验检验变化的统计学显著性。结果：所有模型都获得了较强的基线预测性能，其中RF总体表现最佳。然而，系统性的差异是明显的，特别是对残疾学生来说，这表明高准确性不一定能确保公平的结果。后处理大大减少了组级差异，SPD和EOD值接近于零，尽管准确性和f1分数略有下降，但明显下降。RF和ANN对公平性调整的适应性更强。结论：本研究强调了公平感知机器学习的重要性，如后处理干预，并建议应使用适当的缓解方法，以确保利益在不同学习者之间公平分配，而不偏向任何特定的公平指标。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Information and Software Technology 工程技术-计算机：软件工程

CiteScore

9.10

自引率

7.70%

发文量

164

审稿时长

9.6 weeks

期刊介绍： Information and Software Technology is the international archival journal focusing on research and experience that contributes to the improvement of software development practices. The journal''s scope includes methods and techniques to better engineer software and manage its development. Articles submitted for review should have a clear component of software engineering or address ways to improve the engineering and management of software development. Areas covered by the journal include: • Software management, quality and metrics, • Software processes, • Software architecture, modelling, specification, design and programming • Functional and non-functional software requirements • Software testing and verification & validation • Empirical studies of all aspects of engineering and managing software development Short Communications is a new section dedicated to short papers addressing new ideas, controversial opinions, "Negative" results and much more. Read the Guide for authors for more information. The journal encourages and welcomes submissions of systematic literature studies (reviews and maps) within the scope of the journal. Information and Software Technology is the premiere outlet for systematic literature studies in software engineering.