{"title":"教育人工智能中公平性与绩效之间的权衡:分析基于OULAD的后处理偏见缓解","authors":"Sachini Gunasekara, Mirka Saarela","doi":"10.1016/j.infsof.2025.107933","DOIUrl":null,"url":null,"abstract":"<div><h3>Context:</h3><div>AI-driven educational tools often face a trade-off between fairness and performance, particularly when addressing biases across sensitive demographic attributes. While fairness metrics have been developed to monitor and mitigate bias, optimizing all of these metrics simultaneously is mathematically infeasible, and adjustments to fairness often result in a decrease in overall system performance.</div></div><div><h3>Objective:</h3><div>This study investigates the trade-off between predictive performance and fairness in educational AI systems, focusing on <em>gender</em> and <em>disability</em> as sensitive attributes. We evaluate whether post-processing fairness interventions can mitigate group-level disparities while preserving model usability.</div></div><div><h3>Method:</h3><div>Using the Open University Learning Analytics Dataset, we trained four machine learning models to predict student outcomes. We applied the equalized odds post-processing technique to mitigate bias and assessed model performance with accuracy, F1-score, and AUC, alongside fairness metrics including statistical parity difference (SPD) and equal opportunity difference (EOD). Statistical significance of changes was tested using the Wilcoxon signed-rank test.</div></div><div><h3>Results:</h3><div>All models achieved strong baseline predictive performance, with RF performing best overall. However, systematic disparities were evident, particularly for students with disabilities, showing that high accuracy does not necessarily ensure equitable outcomes. Post-processing reduced group-level disparities substantially, with SPD and EOD values approaching zero, though accuracy and F1-scores decreased slightly but significantly. RF and ANN were more resilient to fairness adjustments.</div></div><div><h3>Conclusion:</h3><div>This study highlights the importance of fairness-aware machine learning, such as post-processing interventions, and suggests that appropriate mitigation methods should be used to ensure benefits are distributed equitably across diverse learners, without favoring any particular fairness metric.</div></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"189 ","pages":"Article 107933"},"PeriodicalIF":4.3000,"publicationDate":"2025-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Trade-offs between fairness and performance in educational AI: Analyzing post-processing bias mitigation on the OULAD\",\"authors\":\"Sachini Gunasekara, Mirka Saarela\",\"doi\":\"10.1016/j.infsof.2025.107933\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><h3>Context:</h3><div>AI-driven educational tools often face a trade-off between fairness and performance, particularly when addressing biases across sensitive demographic attributes. While fairness metrics have been developed to monitor and mitigate bias, optimizing all of these metrics simultaneously is mathematically infeasible, and adjustments to fairness often result in a decrease in overall system performance.</div></div><div><h3>Objective:</h3><div>This study investigates the trade-off between predictive performance and fairness in educational AI systems, focusing on <em>gender</em> and <em>disability</em> as sensitive attributes. We evaluate whether post-processing fairness interventions can mitigate group-level disparities while preserving model usability.</div></div><div><h3>Method:</h3><div>Using the Open University Learning Analytics Dataset, we trained four machine learning models to predict student outcomes. We applied the equalized odds post-processing technique to mitigate bias and assessed model performance with accuracy, F1-score, and AUC, alongside fairness metrics including statistical parity difference (SPD) and equal opportunity difference (EOD). Statistical significance of changes was tested using the Wilcoxon signed-rank test.</div></div><div><h3>Results:</h3><div>All models achieved strong baseline predictive performance, with RF performing best overall. However, systematic disparities were evident, particularly for students with disabilities, showing that high accuracy does not necessarily ensure equitable outcomes. Post-processing reduced group-level disparities substantially, with SPD and EOD values approaching zero, though accuracy and F1-scores decreased slightly but significantly. RF and ANN were more resilient to fairness adjustments.</div></div><div><h3>Conclusion:</h3><div>This study highlights the importance of fairness-aware machine learning, such as post-processing interventions, and suggests that appropriate mitigation methods should be used to ensure benefits are distributed equitably across diverse learners, without favoring any particular fairness metric.</div></div>\",\"PeriodicalId\":54983,\"journal\":{\"name\":\"Information and Software Technology\",\"volume\":\"189 \",\"pages\":\"Article 107933\"},\"PeriodicalIF\":4.3000,\"publicationDate\":\"2025-10-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Information and Software Technology\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0950584925002721\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information and Software Technology","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0950584925002721","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
Trade-offs between fairness and performance in educational AI: Analyzing post-processing bias mitigation on the OULAD
Context:
AI-driven educational tools often face a trade-off between fairness and performance, particularly when addressing biases across sensitive demographic attributes. While fairness metrics have been developed to monitor and mitigate bias, optimizing all of these metrics simultaneously is mathematically infeasible, and adjustments to fairness often result in a decrease in overall system performance.
Objective:
This study investigates the trade-off between predictive performance and fairness in educational AI systems, focusing on gender and disability as sensitive attributes. We evaluate whether post-processing fairness interventions can mitigate group-level disparities while preserving model usability.
Method:
Using the Open University Learning Analytics Dataset, we trained four machine learning models to predict student outcomes. We applied the equalized odds post-processing technique to mitigate bias and assessed model performance with accuracy, F1-score, and AUC, alongside fairness metrics including statistical parity difference (SPD) and equal opportunity difference (EOD). Statistical significance of changes was tested using the Wilcoxon signed-rank test.
Results:
All models achieved strong baseline predictive performance, with RF performing best overall. However, systematic disparities were evident, particularly for students with disabilities, showing that high accuracy does not necessarily ensure equitable outcomes. Post-processing reduced group-level disparities substantially, with SPD and EOD values approaching zero, though accuracy and F1-scores decreased slightly but significantly. RF and ANN were more resilient to fairness adjustments.
Conclusion:
This study highlights the importance of fairness-aware machine learning, such as post-processing interventions, and suggests that appropriate mitigation methods should be used to ensure benefits are distributed equitably across diverse learners, without favoring any particular fairness metric.
期刊介绍:
Information and Software Technology is the international archival journal focusing on research and experience that contributes to the improvement of software development practices. The journal''s scope includes methods and techniques to better engineer software and manage its development. Articles submitted for review should have a clear component of software engineering or address ways to improve the engineering and management of software development. Areas covered by the journal include:
• Software management, quality and metrics,
• Software processes,
• Software architecture, modelling, specification, design and programming
• Functional and non-functional software requirements
• Software testing and verification & validation
• Empirical studies of all aspects of engineering and managing software development
Short Communications is a new section dedicated to short papers addressing new ideas, controversial opinions, "Negative" results and much more. Read the Guide for authors for more information.
The journal encourages and welcomes submissions of systematic literature studies (reviews and maps) within the scope of the journal. Information and Software Technology is the premiere outlet for systematic literature studies in software engineering.