Methodological Considerations for Predicting At-risk Students

Charles Koutcheme, Sami Sarsa, Arto Hellas, Lassi Haaranen, Juho Leinonen
{"title":"Methodological Considerations for Predicting At-risk Students","authors":"Charles Koutcheme, Sami Sarsa, Arto Hellas, Lassi Haaranen, Juho Leinonen","doi":"10.1145/3511861.3511873","DOIUrl":null,"url":null,"abstract":"Educational researchers have long sought to increase student retention. One stream of research focusing on this seeks to automatically identify students who are at risk of dropping out. Studies tend to agree that earlier identification of at-risk students is better, providing more room for targeted interventions. We looked at the interplay of data and predictive power of machine learning models used to identify at-risk students. We critically examine the often used approach where data collected from weeks 1, 2,..., n is used to predict whether a student becomes inactive in the subsequent weeks w, w ≥ n + 1, pointing out issues with this approach that may inflate models’ predictive power. Specifically, our empirical analysis highlights that including students who have become inactive on week n or before, where n > 1, to the data used to identify students who are inactive on the following weeks is a significant cause of bias. Including students who dropped out during the first week makes the problem significantly easier, since they have no data in the subsequent weeks. Based on our results, we recommend including only active students until week n when building and evaluating models for predicting dropouts in subsequent weeks and evaluating and reporting the particularities of the respective course contexts.","PeriodicalId":175694,"journal":{"name":"Proceedings of the 24th Australasian Computing Education Conference","volume":"19 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 24th Australasian Computing Education Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3511861.3511873","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5

Abstract

Educational researchers have long sought to increase student retention. One stream of research focusing on this seeks to automatically identify students who are at risk of dropping out. Studies tend to agree that earlier identification of at-risk students is better, providing more room for targeted interventions. We looked at the interplay of data and predictive power of machine learning models used to identify at-risk students. We critically examine the often used approach where data collected from weeks 1, 2,..., n is used to predict whether a student becomes inactive in the subsequent weeks w, w ≥ n + 1, pointing out issues with this approach that may inflate models’ predictive power. Specifically, our empirical analysis highlights that including students who have become inactive on week n or before, where n > 1, to the data used to identify students who are inactive on the following weeks is a significant cause of bias. Including students who dropped out during the first week makes the problem significantly easier, since they have no data in the subsequent weeks. Based on our results, we recommend including only active students until week n when building and evaluating models for predicting dropouts in subsequent weeks and evaluating and reporting the particularities of the respective course contexts.
预测高危学生的方法学考虑
长期以来,教育研究人员一直在寻求提高学生的保留率。一项专注于此的研究旨在自动识别有辍学风险的学生。研究倾向于同意早期识别有风险的学生更好,为有针对性的干预提供更多空间。我们研究了数据和机器学习模型的预测能力之间的相互作用,这些模型用于识别有风险的学生。我们严格检查常用的方法,其中从第1周,第2周,…, n用于预测学生在接下来的几周内是否变得不活跃,w≥n + 1,指出这种方法的问题可能会夸大模型的预测能力。具体来说,我们的实证分析强调,将n周或之前n > 1的不活跃学生纳入用于识别接下来几周不活跃学生的数据是造成偏差的重要原因。包括那些在第一周就退学的学生使问题变得容易得多,因为他们没有随后几周的数据。根据我们的结果,我们建议在建立和评估模型以预测随后几周的退学情况以及评估和报告各自课程背景的特殊性时,只包括活跃的学生,直到第n周。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信