A Comparison of Bias Mitigation Techniques for Educational Classification Tasks Using Supervised Machine Learning

Information Pub Date : 2024-06-04 DOI:10.3390/info15060326

Tarid Wongvorachan, Okan Bulut, Joyce Xinle Liu, Elisabetta Mazzullo

{"title":"A Comparison of Bias Mitigation Techniques for Educational Classification Tasks Using Supervised Machine Learning","authors":"Tarid Wongvorachan, Okan Bulut, Joyce Xinle Liu, Elisabetta Mazzullo","doi":"10.3390/info15060326","DOIUrl":null,"url":null,"abstract":"Machine learning (ML) has become integral in educational decision-making through technologies such as learning analytics and educational data mining. However, the adoption of machine learning-driven tools without scrutiny risks perpetuating biases. Despite ongoing efforts to tackle fairness issues, their application to educational datasets remains limited. To address the mentioned gap in the literature, this research evaluates the effectiveness of four bias mitigation techniques in an educational dataset aiming at predicting students’ dropout rate. The overarching research question is: “How effective are the techniques of reweighting, resampling, and Reject Option-based Classification (ROC) pivoting in mitigating the predictive bias associated with high school dropout rates in the HSLS:09 dataset?\" The effectiveness of these techniques was assessed based on performance metrics including false positive rate (FPR), accuracy, and F1 score. The study focused on the biological sex of students as the protected attribute. The reweighting technique was found to be ineffective, showing results identical to the baseline condition. Both uniform and preferential resampling techniques significantly reduced predictive bias, especially in the FPR metric but at the cost of reduced accuracy and F1 scores. The ROC pivot technique marginally reduced predictive bias while maintaining the original performance of the classifier, emerging as the optimal method for the HSLS:09 dataset. This research extends the understanding of bias mitigation in educational contexts, demonstrating practical applications of various techniques and providing insights for educators and policymakers. By focusing on an educational dataset, it contributes novel insights beyond the commonly studied datasets, highlighting the importance of context-specific approaches in bias mitigation.","PeriodicalId":510156,"journal":{"name":"Information","volume":"9 10","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3390/info15060326","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Machine learning (ML) has become integral in educational decision-making through technologies such as learning analytics and educational data mining. However, the adoption of machine learning-driven tools without scrutiny risks perpetuating biases. Despite ongoing efforts to tackle fairness issues, their application to educational datasets remains limited. To address the mentioned gap in the literature, this research evaluates the effectiveness of four bias mitigation techniques in an educational dataset aiming at predicting students’ dropout rate. The overarching research question is: “How effective are the techniques of reweighting, resampling, and Reject Option-based Classification (ROC) pivoting in mitigating the predictive bias associated with high school dropout rates in the HSLS:09 dataset?" The effectiveness of these techniques was assessed based on performance metrics including false positive rate (FPR), accuracy, and F1 score. The study focused on the biological sex of students as the protected attribute. The reweighting technique was found to be ineffective, showing results identical to the baseline condition. Both uniform and preferential resampling techniques significantly reduced predictive bias, especially in the FPR metric but at the cost of reduced accuracy and F1 scores. The ROC pivot technique marginally reduced predictive bias while maintaining the original performance of the classifier, emerging as the optimal method for the HSLS:09 dataset. This research extends the understanding of bias mitigation in educational contexts, demonstrating practical applications of various techniques and providing insights for educators and policymakers. By focusing on an educational dataset, it contributes novel insights beyond the commonly studied datasets, highlighting the importance of context-specific approaches in bias mitigation.

查看原文本刊更多论文

使用监督机器学习对教育分类任务中的偏差缓解技术进行比较

通过学习分析和教育数据挖掘等技术，机器学习（ML）已成为教育决策中不可或缺的一部分。然而，不加审查地采用机器学习驱动的工具有可能使偏见长期存在。尽管人们一直在努力解决公平性问题，但这些工具在教育数据集上的应用仍然有限。为了弥补上述文献空白，本研究评估了四种减轻偏见技术在教育数据集中的有效性，旨在预测学生的辍学率。研究的首要问题是："在减轻 HSLS:09 数据集中与高中辍学率相关的预测偏差方面，重新加权、重新采样和基于拒绝选项的分类（ROC）枢轴技术的效果如何？根据假阳性率 (FPR)、准确率和 F1 分数等性能指标对这些技术的有效性进行了评估。研究的重点是将学生的生理性别作为受保护的属性。结果发现，重新加权技术效果不佳，显示出与基线条件相同的结果。均匀重采样技术和优先重采样技术都显著减少了预测偏差，尤其是在 FPR 指标上，但其代价是降低了准确率和 F1 分数。ROC 枢轴技术在保持分类器原有性能的同时，略微减少了预测偏差，成为 HSLS:09 数据集的最佳方法。这项研究扩展了人们对教育背景下减少偏差的理解，展示了各种技术的实际应用，并为教育工作者和政策制定者提供了启示。通过关注教育数据集，该研究提出了超越常见研究数据集的新见解，强调了针对具体情况的方法在减轻偏差方面的重要性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Information

自引率

0.00%

发文量