大学辍学预测模型是否应包括受保护的属性？

Proceedings of the Eighth ACM Conference on Learning @ Scale Pub Date : 2021-03-28 DOI:10.1145/3430895.3460139

Renzhe Yu, Hansol Lee, René F. Kizilcec

{"title":"大学辍学预测模型是否应包括受保护的属性？","authors":"Renzhe Yu, Hansol Lee, René F. Kizilcec","doi":"10.1145/3430895.3460139","DOIUrl":null,"url":null,"abstract":"Early identification of college dropouts can provide tremendous value for improving student success and institutional effectiveness, and predictive analytics are increasingly used for this purpose. However, ethical concerns have emerged about whether including protected attributes in these prediction models discriminates against underrepresented student groups and exacerbates existing inequities. We examine this issue in the context of a large U.S. research university with both residential and fully online degree-seeking students. Based on comprehensive institutional records for the entire student population across multiple years (N = 93,457), we build machine learning models to predict student dropout after one academic year of study and compare the overall performance and fairness of model predictions with or without four protected attributes (gender, URM, first-generation student, and high financial need). We find that including protected attributes does not impact the overall prediction performance and it only marginally improves the algorithmic fairness of predictions. These findings suggest that including protected attributes is preferable. We offer guidance on how to evaluate the impact of including protected attributes in a local context, where institutional stakeholders seek to leverage predictive analytics to support student success.","PeriodicalId":125581,"journal":{"name":"Proceedings of the Eighth ACM Conference on Learning @ Scale","volume":"158 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"39","resultStr":"{\"title\":\"Should College Dropout Prediction Models Include Protected Attributes?\",\"authors\":\"Renzhe Yu, Hansol Lee, René F. Kizilcec\",\"doi\":\"10.1145/3430895.3460139\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Early identification of college dropouts can provide tremendous value for improving student success and institutional effectiveness, and predictive analytics are increasingly used for this purpose. However, ethical concerns have emerged about whether including protected attributes in these prediction models discriminates against underrepresented student groups and exacerbates existing inequities. We examine this issue in the context of a large U.S. research university with both residential and fully online degree-seeking students. Based on comprehensive institutional records for the entire student population across multiple years (N = 93,457), we build machine learning models to predict student dropout after one academic year of study and compare the overall performance and fairness of model predictions with or without four protected attributes (gender, URM, first-generation student, and high financial need). We find that including protected attributes does not impact the overall prediction performance and it only marginally improves the algorithmic fairness of predictions. These findings suggest that including protected attributes is preferable. We offer guidance on how to evaluate the impact of including protected attributes in a local context, where institutional stakeholders seek to leverage predictive analytics to support student success.\",\"PeriodicalId\":125581,\"journal\":{\"name\":\"Proceedings of the Eighth ACM Conference on Learning @ Scale\",\"volume\":\"158 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-03-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"39\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the Eighth ACM Conference on Learning @ Scale\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3430895.3460139\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Eighth ACM Conference on Learning @ Scale","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3430895.3460139","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 39

摘要

及早识别大学辍学者可以为提高学生的成功率和机构的效率提供巨大的价值，预测分析法也越来越多地被用于这一目的。然而，在这些预测模型中加入受保护的属性是否会歧视代表性不足的学生群体并加剧现有的不公平现象，这引起了伦理方面的关注。我们以美国一所大型研究型大学为背景，对这一问题进行了研究，该大学既有住宿生，也有完全通过网络申请学位的学生。基于多年来所有学生的综合机构记录（N = 93,457），我们建立了机器学习模型来预测学生在一学年学习后的辍学情况，并比较了模型预测的整体性能和公平性，包括或不包括四个受保护的属性（性别、乌拉圭移民、第一代学生和高经济需求）。我们发现，加入受保护属性并不会影响整体预测性能，而且只是略微提高了预测的算法公平性。这些发现表明，包含受保护属性是可取的。在机构利益相关者寻求利用预测分析支持学生成功的情况下，我们为如何评估在本地环境中纳入受保护属性的影响提供了指导。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Should College Dropout Prediction Models Include Protected Attributes?

Early identification of college dropouts can provide tremendous value for improving student success and institutional effectiveness, and predictive analytics are increasingly used for this purpose. However, ethical concerns have emerged about whether including protected attributes in these prediction models discriminates against underrepresented student groups and exacerbates existing inequities. We examine this issue in the context of a large U.S. research university with both residential and fully online degree-seeking students. Based on comprehensive institutional records for the entire student population across multiple years (N = 93,457), we build machine learning models to predict student dropout after one academic year of study and compare the overall performance and fairness of model predictions with or without four protected attributes (gender, URM, first-generation student, and high financial need). We find that including protected attributes does not impact the overall prediction performance and it only marginally improves the algorithmic fairness of predictions. These findings suggest that including protected attributes is preferable. We offer guidance on how to evaluate the impact of including protected attributes in a local context, where institutional stakeholders seek to leverage predictive analytics to support student success.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the Eighth ACM Conference on Learning @ Scale

自引率

0.00%

发文量