受保护的属性告诉我们是谁，行为告诉我们如何:公平学生成功模型的人口统计学和行为过采样的比较

LAK23: 13th International Learning Analytics and Knowledge Conference Pub Date : 2022-12-20 DOI:10.1145/3576050.3576149

J. Cock, Muhammad Bilal, Richard Davis, M. Marras, Tanja Kaser

{"title":"受保护的属性告诉我们是谁，行为告诉我们如何:公平学生成功模型的人口统计学和行为过采样的比较","authors":"J. Cock, Muhammad Bilal, Richard Davis, M. Marras, Tanja Kaser","doi":"10.1145/3576050.3576149","DOIUrl":null,"url":null,"abstract":"Algorithms deployed in education can shape the learning experience and success of a student. It is therefore important to understand whether and how such algorithms might create inequalities or amplify existing biases. In this paper, we analyze the fairness of models which use behavioral data to identify at-risk students and suggest two novel pre-processing approaches for bias mitigation. Based on the concept of intersectionality, the first approach involves intelligent oversampling on combinations of demographic attributes. The second approach does not require any knowledge of demographic attributes and is based on the assumption that such attributes are a (noisy) proxy for student behavior. We hence propose to directly oversample different types of behaviors identified in a cluster analysis. We evaluate our approaches on data from (i) an open-ended learning environment and (ii) a flipped classroom course. Our results show that both approaches can mitigate model bias. Directly oversampling on behavior is a valuable alternative, when demographic metadata is not available. Source code and extended results are provided in https://github.com/epfl-ml4ed/behavioral-oversampling.","PeriodicalId":394433,"journal":{"name":"LAK23: 13th International Learning Analytics and Knowledge Conference","volume":"48 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Protected Attributes Tell Us Who, Behavior Tells Us How: A Comparison of Demographic and Behavioral Oversampling for Fair Student Success Modeling\",\"authors\":\"J. Cock, Muhammad Bilal, Richard Davis, M. Marras, Tanja Kaser\",\"doi\":\"10.1145/3576050.3576149\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Algorithms deployed in education can shape the learning experience and success of a student. It is therefore important to understand whether and how such algorithms might create inequalities or amplify existing biases. In this paper, we analyze the fairness of models which use behavioral data to identify at-risk students and suggest two novel pre-processing approaches for bias mitigation. Based on the concept of intersectionality, the first approach involves intelligent oversampling on combinations of demographic attributes. The second approach does not require any knowledge of demographic attributes and is based on the assumption that such attributes are a (noisy) proxy for student behavior. We hence propose to directly oversample different types of behaviors identified in a cluster analysis. We evaluate our approaches on data from (i) an open-ended learning environment and (ii) a flipped classroom course. Our results show that both approaches can mitigate model bias. Directly oversampling on behavior is a valuable alternative, when demographic metadata is not available. Source code and extended results are provided in https://github.com/epfl-ml4ed/behavioral-oversampling.\",\"PeriodicalId\":394433,\"journal\":{\"name\":\"LAK23: 13th International Learning Analytics and Knowledge Conference\",\"volume\":\"48 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-12-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"LAK23: 13th International Learning Analytics and Knowledge Conference\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3576050.3576149\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"LAK23: 13th International Learning Analytics and Knowledge Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3576050.3576149","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

在教育中部署的算法可以塑造学生的学习经历和成功。因此，了解这些算法是否以及如何可能造成不平等或放大现有偏见是很重要的。在本文中，我们分析了使用行为数据来识别有风险学生的模型的公平性，并提出了两种新的预处理方法来减轻偏见。基于交叉性的概念，第一种方法涉及对人口统计属性组合的智能过采样。第二种方法不需要任何人口统计属性的知识，并且基于这样的假设，即这些属性是学生行为的(嘈杂的)代理。因此，我们建议直接对聚类分析中确定的不同类型的行为进行抽样。我们根据(i)开放式学习环境和(ii)翻转课堂课程的数据评估我们的方法。我们的研究结果表明，这两种方法都可以减轻模型偏差。当人口统计数据不可用时，直接对行为进行过采样是一种有价值的选择。源代码和扩展结果在https://github.com/epfl-ml4ed/behavioral-oversampling中提供。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Protected Attributes Tell Us Who, Behavior Tells Us How: A Comparison of Demographic and Behavioral Oversampling for Fair Student Success Modeling

Algorithms deployed in education can shape the learning experience and success of a student. It is therefore important to understand whether and how such algorithms might create inequalities or amplify existing biases. In this paper, we analyze the fairness of models which use behavioral data to identify at-risk students and suggest two novel pre-processing approaches for bias mitigation. Based on the concept of intersectionality, the first approach involves intelligent oversampling on combinations of demographic attributes. The second approach does not require any knowledge of demographic attributes and is based on the assumption that such attributes are a (noisy) proxy for student behavior. We hence propose to directly oversample different types of behaviors identified in a cluster analysis. We evaluate our approaches on data from (i) an open-ended learning environment and (ii) a flipped classroom course. Our results show that both approaches can mitigate model bias. Directly oversampling on behavior is a valuable alternative, when demographic metadata is not available. Source code and extended results are provided in https://github.com/epfl-ml4ed/behavioral-oversampling.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

LAK23: 13th International Learning Analytics and Knowledge Conference

自引率

0.00%

发文量