Predictive Modeling of Student Dropout in MOOCs and Self-Regulated Learning

IF 2.6 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS
Georgios Psathas, Theano K. Chatzidaki, Stavros N. Demetriadis
{"title":"Predictive Modeling of Student Dropout in MOOCs and Self-Regulated Learning","authors":"Georgios Psathas, Theano K. Chatzidaki, Stavros N. Demetriadis","doi":"10.3390/computers12100194","DOIUrl":null,"url":null,"abstract":"The primary objective of this study is to examine the factors that contribute to the early prediction of Massive Open Online Courses (MOOCs) dropouts in order to identify and support at-risk students. We utilize MOOC data of specific duration, with a guided study pace. The dataset exhibits class imbalance, and we apply oversampling techniques to ensure data balancing and unbiased prediction. We examine the predictive performance of five classic classification machine learning (ML) algorithms under four different oversampling techniques and various evaluation metrics. Additionally, we explore the influence of self-reported self-regulated learning (SRL) data provided by students and various other prominent features of MOOCs as potential indicators of early stage dropout prediction. The research questions focus on (1) the performance of the classic classification ML models using various evaluation metrics before and after different methods of oversampling, (2) which self-reported data may constitute crucial predictors for dropout propensity, and (3) the effect of the SRL factor on the dropout prediction performance. The main conclusions are: (1) prominent predictors, including employment status, frequency of chat tool usage, prior subject-related experiences, gender, education, and willingness to participate, exhibit remarkable efficacy in achieving high to excellent recall performance, particularly when specific combinations of algorithms and oversampling methods are applied, (2) self-reported SRL factor, combined with easily provided/self-reported features, performed well as a predictor in terms of recall when LR and SVM algorithms were employed, (3) it is crucial to test diverse machine learning algorithms and oversampling methods in predictive modeling.","PeriodicalId":46292,"journal":{"name":"Computers","volume":"139 1","pages":"0"},"PeriodicalIF":2.6000,"publicationDate":"2023-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3390/computers12100194","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 1

Abstract

The primary objective of this study is to examine the factors that contribute to the early prediction of Massive Open Online Courses (MOOCs) dropouts in order to identify and support at-risk students. We utilize MOOC data of specific duration, with a guided study pace. The dataset exhibits class imbalance, and we apply oversampling techniques to ensure data balancing and unbiased prediction. We examine the predictive performance of five classic classification machine learning (ML) algorithms under four different oversampling techniques and various evaluation metrics. Additionally, we explore the influence of self-reported self-regulated learning (SRL) data provided by students and various other prominent features of MOOCs as potential indicators of early stage dropout prediction. The research questions focus on (1) the performance of the classic classification ML models using various evaluation metrics before and after different methods of oversampling, (2) which self-reported data may constitute crucial predictors for dropout propensity, and (3) the effect of the SRL factor on the dropout prediction performance. The main conclusions are: (1) prominent predictors, including employment status, frequency of chat tool usage, prior subject-related experiences, gender, education, and willingness to participate, exhibit remarkable efficacy in achieving high to excellent recall performance, particularly when specific combinations of algorithms and oversampling methods are applied, (2) self-reported SRL factor, combined with easily provided/self-reported features, performed well as a predictor in terms of recall when LR and SVM algorithms were employed, (3) it is crucial to test diverse machine learning algorithms and oversampling methods in predictive modeling.
mooc学生退学预测模型与自主学习
本研究的主要目的是研究影响大规模在线开放课程(MOOCs)辍学早期预测的因素,以便识别和支持有风险的学生。我们利用特定时长的MOOC数据,指导学习节奏。数据集表现出类别不平衡,我们采用过采样技术来确保数据平衡和无偏预测。我们在四种不同的过采样技术和各种评估指标下研究了五种经典分类机器学习(ML)算法的预测性能。此外,我们还探讨了学生提供的自我报告自我调节学习(SRL)数据以及mooc的各种其他突出特征作为早期辍学预测的潜在指标的影响。研究问题集中在(1)使用不同过采样方法前后不同评价指标的经典分类ML模型的性能,(2)自我报告数据可能构成辍学倾向的关键预测因子,以及(3)SRL因素对辍学预测性能的影响。主要结论是:(1)突出的预测因素,包括就业状况、聊天工具使用频率、先前的主题相关经验、性别、教育程度和参与意愿,在实现高到优秀的召回性能方面表现出显著的有效性,特别是当应用算法和过采样方法的特定组合时;(2)自我报告的SRL因素,结合容易提供/自我报告的特征;当使用LR和SVM算法时,在召回率方面表现良好,(3)在预测建模中测试不同的机器学习算法和过采样方法至关重要。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Computers
Computers COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS-
CiteScore
5.40
自引率
3.60%
发文量
153
审稿时长
11 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信