Oscar Blessed Deho;Lin Liu;Jiuyong Li;Jixue Liu;Chen Zhan;Srecko Joksimovic
{"title":"When the Past != The Future: Assessing the Impact of Dataset Drift on the Fairness of Learning Analytics Models","authors":"Oscar Blessed Deho;Lin Liu;Jiuyong Li;Jixue Liu;Chen Zhan;Srecko Joksimovic","doi":"10.1109/TLT.2024.3351352","DOIUrl":null,"url":null,"abstract":"Learning analytics (LA), like much of machine learning, assumes the training and test datasets come from the same distribution. Therefore, LA models built on past observations are (implicitly) expected to work well for future observations. However, this assumption does not always hold in practice because the dataset may drift. Recently, algorithmic fairness has gained significant attention. Nevertheless, algorithmic fairness research has paid little attention to dataset drift. Majority of the existing fairness algorithms are “statically” designed. Put another way, LA models \n<italic>tuned</i>\n to be “fair” on past data are expected to still be “fair” when dealing with current/future data. However, it is counter-intuitive to deploy a \n<italic>statically</i>\n fair algorithm to a \n<italic>nonstationary</i>\n world. There is, therefore, a need to assess the impact of dataset drift on the unfairness of LA models. For this reason, we investigate the relationship between dataset drift and unfairness of LA models. Specifically, we first measure the degree of drift in the features (i.e., covariates) and target label of our dataset. After that, we train predictive models on the dataset and evaluate the relationship between the dataset drift and the unfairness of the predictive models. Our findings suggest a directly proportional relationship between dataset drift and unfairness. Further, we find covariate drift to have the most impact on unfairness of models as compared to target drift, and there are no guarantees that a once fair model would consistently remain fair. Our findings imply that “robustness” of fair LA models to dataset drift is necessary before deployment.","PeriodicalId":49191,"journal":{"name":"IEEE Transactions on Learning Technologies","volume":"17 ","pages":"1007-1020"},"PeriodicalIF":2.9000,"publicationDate":"2024-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Learning Technologies","FirstCategoryId":"95","ListUrlMain":"https://ieeexplore.ieee.org/document/10384787/","RegionNum":3,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0
Abstract
Learning analytics (LA), like much of machine learning, assumes the training and test datasets come from the same distribution. Therefore, LA models built on past observations are (implicitly) expected to work well for future observations. However, this assumption does not always hold in practice because the dataset may drift. Recently, algorithmic fairness has gained significant attention. Nevertheless, algorithmic fairness research has paid little attention to dataset drift. Majority of the existing fairness algorithms are “statically” designed. Put another way, LA models
tuned
to be “fair” on past data are expected to still be “fair” when dealing with current/future data. However, it is counter-intuitive to deploy a
statically
fair algorithm to a
nonstationary
world. There is, therefore, a need to assess the impact of dataset drift on the unfairness of LA models. For this reason, we investigate the relationship between dataset drift and unfairness of LA models. Specifically, we first measure the degree of drift in the features (i.e., covariates) and target label of our dataset. After that, we train predictive models on the dataset and evaluate the relationship between the dataset drift and the unfairness of the predictive models. Our findings suggest a directly proportional relationship between dataset drift and unfairness. Further, we find covariate drift to have the most impact on unfairness of models as compared to target drift, and there are no guarantees that a once fair model would consistently remain fair. Our findings imply that “robustness” of fair LA models to dataset drift is necessary before deployment.
期刊介绍:
The IEEE Transactions on Learning Technologies covers all advances in learning technologies and their applications, including but not limited to the following topics: innovative online learning systems; intelligent tutors; educational games; simulation systems for education and training; collaborative learning tools; learning with mobile devices; wearable devices and interfaces for learning; personalized and adaptive learning systems; tools for formative and summative assessment; tools for learning analytics and educational data mining; ontologies for learning systems; standards and web services that support learning; authoring tools for learning materials; computer support for peer tutoring; learning via computer-mediated inquiry, field, and lab work; social learning techniques; social networks and infrastructures for learning and knowledge sharing; and creation and management of learning objects.