{"title":"When and how biases seep in: Enhancing debiasing approaches for fair educational predictive analytics","authors":"Lin Li, Namrata Srivastava, Jia Rong, Quanlong Guan, Dragan Gašević, Guanliang Chen","doi":"10.1111/bjet.13575","DOIUrl":null,"url":null,"abstract":"<div>\n \n \n <section>\n \n <p>The use of predictive analytics powered by machine learning (ML) to model educational data has increasingly been identified to exhibit bias towards marginalized populations, prompting the need for more equitable applications of these techniques. To tackle bias that emerges in training data or models at different stages of the ML modelling pipeline, numerous debiasing approaches have been proposed. Yet, research into state-of-the-art techniques for effectively employing these approaches to enhance fairness in educational predictive scenarios remains limited. Prior studies often focused on mitigating bias from a single source at a specific stage of model construction within narrowly defined scenarios, overlooking the complexities of bias originating from multiple sources across various stages. Moreover, these approaches were often evaluated using typical threshold-dependent fairness metrics, which fail to account for real-world educational scenarios where thresholds are typically unknown before evaluation. To bridge these gaps, this study systematically examined a total of 28 representative debiasing approaches, categorized by the sources of bias and the stage they targeted, for two critical educational predictive tasks, namely forum post classification and student career prediction. Both tasks involve a two-phase modelling process where features learned from upstream models in the first phase are fed into classical ML models for final predictions, which is a common yet under-explored setting for educational data modelling. The study observed that addressing local stereotypical bias, label bias or proxy discrimination in training data, as well as imposing fairness constraints on models, can effectively enhance predictive fairness. But their efficacy was often compromised when features from upstream models were inherently biased. Beyond that, this study proposes two novel strategies, namely Multi-Stage and Multi-Source debiasing to integrate existing approaches. These strategies demonstrated substantial improvements in mitigating unfairness, underscoring the importance of unified approaches capable of addressing biases from various sources across multiple stages.</p>\n </section>\n \n <section>\n \n <div>\n \n <div>\n \n <h3>Practitioner notes</h3>\n <p>What is already known about this topic\n\n </p><ul>\n \n <li>Predictive analytics for educational data modelling often exhibit bias against students from certain demographic groups based on sensitive attributes.</li>\n \n <li>Bias can emerge in training data or models at different time points of the ML modelling pipeline, resulting in unfair final predictions.</li>\n \n <li>Numerous debiasing approaches have been developed to tackle bias at different stages, including pre-processing training data, in-processing models, and post-processing predicted outcomes or trained models.</li>\n </ul>\n <p>What this paper adds\n\n </p><ul>\n \n <li>A systematic evaluation of 28 state-of-the-art debiasing approaches covering multiple sources of biases and multiple stages across two different educational predictive scenarios, identifying leading sources of data biases contributing to predictive unfairness.</li>\n \n <li>Further enhancing predictive fairness with proposed debiasing strategies considering the multi-source and multi-stage characteristics of biases.</li>\n \n <li>Revealing potential risks of debiasing focused on a single sensitive attribute.</li>\n </ul>\n <p>Implications for practitioners\n\n </p><ul>\n \n <li>Pre-processing approaches, particularly those addressing stereotypical bias, label bias and proxy discrimination, are generally effective for improving fairness in educational predictions. Re-weighing methods are especially useful for smaller datasets to tackle stereotypical bias.</li>\n \n <li>When dealing with two-phase modelling, biases inherently encoded in the features generated from upstream models might not be effectively addressed by debiasing approaches applied to downstream models.</li>\n \n <li>Combining debiasing approaches to tackle multiple sources of biases across multiple stages significantly enhances predictive fairness.</li>\n </ul>\n </div>\n </div>\n </section>\n </div>","PeriodicalId":48315,"journal":{"name":"British Journal of Educational Technology","volume":"56 6","pages":"2478-2501"},"PeriodicalIF":8.1000,"publicationDate":"2025-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://bera-journals.onlinelibrary.wiley.com/doi/epdf/10.1111/bjet.13575","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"British Journal of Educational Technology","FirstCategoryId":"95","ListUrlMain":"https://bera-journals.onlinelibrary.wiley.com/doi/10.1111/bjet.13575","RegionNum":1,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"EDUCATION & EDUCATIONAL RESEARCH","Score":null,"Total":0}
引用次数: 0
Abstract
The use of predictive analytics powered by machine learning (ML) to model educational data has increasingly been identified to exhibit bias towards marginalized populations, prompting the need for more equitable applications of these techniques. To tackle bias that emerges in training data or models at different stages of the ML modelling pipeline, numerous debiasing approaches have been proposed. Yet, research into state-of-the-art techniques for effectively employing these approaches to enhance fairness in educational predictive scenarios remains limited. Prior studies often focused on mitigating bias from a single source at a specific stage of model construction within narrowly defined scenarios, overlooking the complexities of bias originating from multiple sources across various stages. Moreover, these approaches were often evaluated using typical threshold-dependent fairness metrics, which fail to account for real-world educational scenarios where thresholds are typically unknown before evaluation. To bridge these gaps, this study systematically examined a total of 28 representative debiasing approaches, categorized by the sources of bias and the stage they targeted, for two critical educational predictive tasks, namely forum post classification and student career prediction. Both tasks involve a two-phase modelling process where features learned from upstream models in the first phase are fed into classical ML models for final predictions, which is a common yet under-explored setting for educational data modelling. The study observed that addressing local stereotypical bias, label bias or proxy discrimination in training data, as well as imposing fairness constraints on models, can effectively enhance predictive fairness. But their efficacy was often compromised when features from upstream models were inherently biased. Beyond that, this study proposes two novel strategies, namely Multi-Stage and Multi-Source debiasing to integrate existing approaches. These strategies demonstrated substantial improvements in mitigating unfairness, underscoring the importance of unified approaches capable of addressing biases from various sources across multiple stages.
Practitioner notes
What is already known about this topic
Predictive analytics for educational data modelling often exhibit bias against students from certain demographic groups based on sensitive attributes.
Bias can emerge in training data or models at different time points of the ML modelling pipeline, resulting in unfair final predictions.
Numerous debiasing approaches have been developed to tackle bias at different stages, including pre-processing training data, in-processing models, and post-processing predicted outcomes or trained models.
What this paper adds
A systematic evaluation of 28 state-of-the-art debiasing approaches covering multiple sources of biases and multiple stages across two different educational predictive scenarios, identifying leading sources of data biases contributing to predictive unfairness.
Further enhancing predictive fairness with proposed debiasing strategies considering the multi-source and multi-stage characteristics of biases.
Revealing potential risks of debiasing focused on a single sensitive attribute.
Implications for practitioners
Pre-processing approaches, particularly those addressing stereotypical bias, label bias and proxy discrimination, are generally effective for improving fairness in educational predictions. Re-weighing methods are especially useful for smaller datasets to tackle stereotypical bias.
When dealing with two-phase modelling, biases inherently encoded in the features generated from upstream models might not be effectively addressed by debiasing approaches applied to downstream models.
Combining debiasing approaches to tackle multiple sources of biases across multiple stages significantly enhances predictive fairness.
期刊介绍:
BJET is a primary source for academics and professionals in the fields of digital educational and training technology throughout the world. The Journal is published by Wiley on behalf of The British Educational Research Association (BERA). It publishes theoretical perspectives, methodological developments and high quality empirical research that demonstrate whether and how applications of instructional/educational technology systems, networks, tools and resources lead to improvements in formal and non-formal education at all levels, from early years through to higher, technical and vocational education, professional development and corporate training.