An Explainable Machine Learning Approach to Predicting and Understanding Dropouts in MOOCs

Kastamonu Egitim Dergisi Pub Date : 2023-01-31 DOI:10.24106/kefdergi.1246458

Erkan Er

{"title":"An Explainable Machine Learning Approach to Predicting and Understanding Dropouts in MOOCs","authors":"Erkan Er","doi":"10.24106/kefdergi.1246458","DOIUrl":null,"url":null,"abstract":"Purpose: The purpose of this study is to predict dropouts in two runs of the same MOOC using an explainable machine learning approach. With the explainable approach, we aim to enable the interpretation of the black-box predictive models from a pedagogical perspective and to produce actionable insights for related educational interventions. The similarity and the differences in feature importance between the predictive models were also examined. \nDesign/Methodology/Approach: This is a quantitative study performed on a large public dataset containing activity logs in a MOOC. In total, 21 features were generated and standardized before the analysis. Multi-layer perceptron neural network was used as the black-box machine learning algorithm to build the predictive models. The model performances were evaluated using the accuracy and AUC metrics. SHAP was used to obtain explainable results about the effects of different features on students’ success or failure. \nFindings: According to the results, the predictive models were quite accurate, showing the capacity of the features generated in capturing student engagement. With the SHAP approach, reasons for dropouts for the whole class, as well as for specific students were identified. While mostly disengagement in assignments and course wares caused dropouts in both course runs, interaction with video (the main teaching component) showed a limited predictive power. In total six features were common strong predictors in both runs, and the remaining four features belonged to only one run. Moreover, using waterfall plots, the reasons for predictions pertaining to two randomly chosen students were explored. The results showed that dropouts might be explained by different predictions for each student, and the variables associated with dropouts might be different than the predictions conducted for the whole course. \nHighlights: This study illustrated the use of an explainable machine learning approach called SHAP to interpret the underlying reasons for dropout predictions. Such explainable approaches offer a promising direction for creating timely class-wide interventions as well as for providing personalized support for tailored to specific students. Moreover, this study provides strong evidence that transferring predictive models between different contexts is less like to be successful.","PeriodicalId":33167,"journal":{"name":"Kastamonu Egitim Dergisi","volume":"1 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Kastamonu Egitim Dergisi","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.24106/kefdergi.1246458","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

Purpose: The purpose of this study is to predict dropouts in two runs of the same MOOC using an explainable machine learning approach. With the explainable approach, we aim to enable the interpretation of the black-box predictive models from a pedagogical perspective and to produce actionable insights for related educational interventions. The similarity and the differences in feature importance between the predictive models were also examined. Design/Methodology/Approach: This is a quantitative study performed on a large public dataset containing activity logs in a MOOC. In total, 21 features were generated and standardized before the analysis. Multi-layer perceptron neural network was used as the black-box machine learning algorithm to build the predictive models. The model performances were evaluated using the accuracy and AUC metrics. SHAP was used to obtain explainable results about the effects of different features on students’ success or failure. Findings: According to the results, the predictive models were quite accurate, showing the capacity of the features generated in capturing student engagement. With the SHAP approach, reasons for dropouts for the whole class, as well as for specific students were identified. While mostly disengagement in assignments and course wares caused dropouts in both course runs, interaction with video (the main teaching component) showed a limited predictive power. In total six features were common strong predictors in both runs, and the remaining four features belonged to only one run. Moreover, using waterfall plots, the reasons for predictions pertaining to two randomly chosen students were explored. The results showed that dropouts might be explained by different predictions for each student, and the variables associated with dropouts might be different than the predictions conducted for the whole course. Highlights: This study illustrated the use of an explainable machine learning approach called SHAP to interpret the underlying reasons for dropout predictions. Such explainable approaches offer a promising direction for creating timely class-wide interventions as well as for providing personalized support for tailored to specific students. Moreover, this study provides strong evidence that transferring predictive models between different contexts is less like to be successful.

查看原文本刊更多论文

一种可解释的机器学习方法来预测和理解mooc中的辍学率

目的:本研究的目的是使用一种可解释的机器学习方法预测同一门MOOC的两次运行中的辍学率。通过可解释的方法，我们的目标是从教育学的角度来解释黑箱预测模型，并为相关的教育干预提供可操作的见解。分析了各预测模型在特征重要性上的相似性和差异性。设计/方法/方法:这是一项对大型公共数据集(包含MOOC中的活动日志)进行的定量研究。在分析之前，总共生成并标准化了21个特征。采用多层感知器神经网络作为黑箱机器学习算法建立预测模型。使用精度和AUC指标评估模型性能。使用SHAP来获得不同特征对学生成功或失败的影响的可解释结果。发现:根据结果，预测模型相当准确，显示了生成的特征在捕捉学生参与度方面的能力。通过SHAP方法，确定了整个班级以及特定学生的退学原因。在这两门课程中，主要是作业和课件的不投入导致了退学，而与视频(主要的教学组成部分)的互动显示出有限的预测能力。总共有6个特征在两个运行中都是共同的强预测因子，其余4个特征只属于一个运行。此外，使用瀑布图，对两个随机选择的学生进行预测的原因进行了探索。结果表明，对每个学生的退学可能有不同的预测，与退学相关的变量可能与对整个课程的预测不同。重点:本研究说明了使用可解释的机器学习方法SHAP来解释辍学预测的潜在原因。这种可解释的方法为创建及时的班级干预以及为特定学生提供个性化支持提供了一个有希望的方向。此外，这项研究提供了强有力的证据，证明在不同的环境之间转移预测模型不太可能成功。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Kastamonu Egitim Dergisi

自引率

0.00%

发文量

审稿时长

50 weeks