When the Past != The Future: Assessing the Impact of Dataset Drift on the Fairness of Learning Analytics Models

IF 2.9 3区教育学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

IEEE Transactions on Learning Technologies Pub Date : 2024-01-09 DOI:10.1109/TLT.2024.3351352

Oscar Blessed Deho;Lin Liu;Jiuyong Li;Jixue Liu;Chen Zhan;Srecko Joksimovic

{"title":"When the Past != The Future: Assessing the Impact of Dataset Drift on the Fairness of Learning Analytics Models","authors":"Oscar Blessed Deho;Lin Liu;Jiuyong Li;Jixue Liu;Chen Zhan;Srecko Joksimovic","doi":"10.1109/TLT.2024.3351352","DOIUrl":null,"url":null,"abstract":"Learning analytics (LA), like much of machine learning, assumes the training and test datasets come from the same distribution. Therefore, LA models built on past observations are (implicitly) expected to work well for future observations. However, this assumption does not always hold in practice because the dataset may drift. Recently, algorithmic fairness has gained significant attention. Nevertheless, algorithmic fairness research has paid little attention to dataset drift. Majority of the existing fairness algorithms are “statically” designed. Put another way, LA models \n<italic>tuned</i>\n to be “fair” on past data are expected to still be “fair” when dealing with current/future data. However, it is counter-intuitive to deploy a \n<italic>statically</i>\n fair algorithm to a \n<italic>nonstationary</i>\n world. There is, therefore, a need to assess the impact of dataset drift on the unfairness of LA models. For this reason, we investigate the relationship between dataset drift and unfairness of LA models. Specifically, we first measure the degree of drift in the features (i.e., covariates) and target label of our dataset. After that, we train predictive models on the dataset and evaluate the relationship between the dataset drift and the unfairness of the predictive models. Our findings suggest a directly proportional relationship between dataset drift and unfairness. Further, we find covariate drift to have the most impact on unfairness of models as compared to target drift, and there are no guarantees that a once fair model would consistently remain fair. Our findings imply that “robustness” of fair LA models to dataset drift is necessary before deployment.","PeriodicalId":49191,"journal":{"name":"IEEE Transactions on Learning Technologies","volume":"17 ","pages":"1007-1020"},"PeriodicalIF":2.9000,"publicationDate":"2024-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Learning Technologies","FirstCategoryId":"95","ListUrlMain":"https://ieeexplore.ieee.org/document/10384787/","RegionNum":3,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}

引用次数: 0

Abstract

Learning analytics (LA), like much of machine learning, assumes the training and test datasets come from the same distribution. Therefore, LA models built on past observations are (implicitly) expected to work well for future observations. However, this assumption does not always hold in practice because the dataset may drift. Recently, algorithmic fairness has gained significant attention. Nevertheless, algorithmic fairness research has paid little attention to dataset drift. Majority of the existing fairness algorithms are “statically” designed. Put another way, LA models tuned to be “fair” on past data are expected to still be “fair” when dealing with current/future data. However, it is counter-intuitive to deploy a statically fair algorithm to a nonstationary world. There is, therefore, a need to assess the impact of dataset drift on the unfairness of LA models. For this reason, we investigate the relationship between dataset drift and unfairness of LA models. Specifically, we first measure the degree of drift in the features (i.e., covariates) and target label of our dataset. After that, we train predictive models on the dataset and evaluate the relationship between the dataset drift and the unfairness of the predictive models. Our findings suggest a directly proportional relationship between dataset drift and unfairness. Further, we find covariate drift to have the most impact on unfairness of models as compared to target drift, and there are no guarantees that a once fair model would consistently remain fair. Our findings imply that “robustness” of fair LA models to dataset drift is necessary before deployment.

查看原文本刊更多论文

当过去！=未来：评估数据集漂移对学习分析模型公平性的影响

学习分析（LA）与大部分机器学习一样，都假定训练数据集和测试数据集来自相同的分布。因此，基于过去观察结果建立的学习分析模型（隐含地）有望在未来的观察结果中发挥良好的作用。然而，这一假设在实践中并不总是成立的，因为数据集可能会漂移。最近，算法公平性受到了广泛关注。然而，算法公平性研究很少关注数据集漂移问题。现有的大多数公平性算法都是 "静态 "设计的。换句话说，在过去的数据上调整为 "公平 "的洛杉矶模型，在处理当前/未来的数据时预计仍然是 "公平 "的。然而，将静态公平算法应用于非稳态世界是违背直觉的。因此，有必要评估数据集漂移对 LA 模型公平性的影响。为此，我们研究了数据集漂移与 LA 模型不公平性之间的关系。具体来说，我们首先测量数据集的特征（即协变量）和目标标签的漂移程度。然后，我们在数据集上训练预测模型，并评估数据集漂移与预测模型不公平程度之间的关系。我们的研究结果表明，数据集漂移与不公平之间存在正比关系。此外，我们发现与目标漂移相比，协变量漂移对模型不公平程度的影响最大，而且无法保证曾经公平的模型会一直保持公平。我们的研究结果表明，在部署公平的洛杉矶模型之前，必须使其对数据集漂移具有 "稳健性"。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Learning Technologies COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS-

CiteScore

7.50

自引率

5.40%

发文量

审稿时长

>12 weeks

期刊介绍： The IEEE Transactions on Learning Technologies covers all advances in learning technologies and their applications, including but not limited to the following topics: innovative online learning systems; intelligent tutors; educational games; simulation systems for education and training; collaborative learning tools; learning with mobile devices; wearable devices and interfaces for learning; personalized and adaptive learning systems; tools for formative and summative assessment; tools for learning analytics and educational data mining; ontologies for learning systems; standards and web services that support learning; authoring tools for learning materials; computer support for peer tutoring; learning via computer-mediated inquiry, field, and lab work; social learning techniques; social networks and infrastructures for learning and knowledge sharing; and creation and management of learning objects.