Predicting Master's students' academic performance: an empirical study in Germany.

IF 12.1 Q1 EDUCATION & EDUCATIONAL RESEARCH
Smart Learning Environments Pub Date : 2022-01-01 Epub Date: 2022-12-23 DOI:10.1186/s40561-022-00220-y
Sarah Alturki, Lea Cohausz, Heiner Stuckenschmidt
{"title":"Predicting Master's students' academic performance: an empirical study in Germany.","authors":"Sarah Alturki, Lea Cohausz, Heiner Stuckenschmidt","doi":"10.1186/s40561-022-00220-y","DOIUrl":null,"url":null,"abstract":"<p><p>The tremendous growth in electronic educational data creates the need to have meaningful information extracted from it. Educational Data Mining (EDM) is an exciting research area that can reveal valuable knowledge from educational databases. This knowledge can be used for many purposes, including identifying dropouts or weak students who need special attention and discovering extraordinary students who can be offered lifetime opportunities. Although former studies in EDM used an extensive range of features for predicting students' academic achievement (in terms of (i) achieved grades or (ii) passing and failing), those features are sometimes not obtainable for practical usage, and therefore, the prediction models are not feasible for employment. This study uses data mining (DM) algorithms to predict the academic performance of master' s students by using a non-extensive data set and including only the features that are easy to collect at the beginning of a studying program. To perform this study, we have collected over 700 students' records from 2010 to 2018 from the Faculty of Business Informatics and Mathematics at the University of Mannheim in Germany. Those records include demographics and post-enrollment features such as semester grades. The empirical results show the following: (i) the most significant features for predicting students' academic achievements are the students' grades in each semester (importance rate between 14 and 36%), followed by the distance from students' accommodation to university (importance rate between 6 and 18%) and culture (importance rate between 7 and 17%). On the other hand, gender, age, the numbers of failed courses, and the number of registered and unregistered exams per semester are less significant for the predictions. (ii) As expected, predictions performed after the second semester is more accurate than those performed after the first semester. (iii) Unsurprisingly, models that predict two classes yield better results than those that predict three. (iv) Random Forest classifier performs the best in all prediction models (0.77-0.94 accuracy), and using oversampling methods to deal with imbalanced data can significantly improve the performance of DM methods. For future work, we recommend testing the predictive models on other master programs and a larger datasets. Furthermore, we recommend investigating other oversampling approaches.</p>","PeriodicalId":21774,"journal":{"name":"Smart Learning Environments","volume":" ","pages":"38"},"PeriodicalIF":12.1000,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9786516/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Smart Learning Environments","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1186/s40561-022-00220-y","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2022/12/23 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"EDUCATION & EDUCATIONAL RESEARCH","Score":null,"Total":0}
引用次数: 0

Abstract

The tremendous growth in electronic educational data creates the need to have meaningful information extracted from it. Educational Data Mining (EDM) is an exciting research area that can reveal valuable knowledge from educational databases. This knowledge can be used for many purposes, including identifying dropouts or weak students who need special attention and discovering extraordinary students who can be offered lifetime opportunities. Although former studies in EDM used an extensive range of features for predicting students' academic achievement (in terms of (i) achieved grades or (ii) passing and failing), those features are sometimes not obtainable for practical usage, and therefore, the prediction models are not feasible for employment. This study uses data mining (DM) algorithms to predict the academic performance of master' s students by using a non-extensive data set and including only the features that are easy to collect at the beginning of a studying program. To perform this study, we have collected over 700 students' records from 2010 to 2018 from the Faculty of Business Informatics and Mathematics at the University of Mannheim in Germany. Those records include demographics and post-enrollment features such as semester grades. The empirical results show the following: (i) the most significant features for predicting students' academic achievements are the students' grades in each semester (importance rate between 14 and 36%), followed by the distance from students' accommodation to university (importance rate between 6 and 18%) and culture (importance rate between 7 and 17%). On the other hand, gender, age, the numbers of failed courses, and the number of registered and unregistered exams per semester are less significant for the predictions. (ii) As expected, predictions performed after the second semester is more accurate than those performed after the first semester. (iii) Unsurprisingly, models that predict two classes yield better results than those that predict three. (iv) Random Forest classifier performs the best in all prediction models (0.77-0.94 accuracy), and using oversampling methods to deal with imbalanced data can significantly improve the performance of DM methods. For future work, we recommend testing the predictive models on other master programs and a larger datasets. Furthermore, we recommend investigating other oversampling approaches.

德国硕士研究生学习成绩预测的实证研究
电子教育数据的巨大增长创造了从中提取有意义信息的需求。教育数据挖掘(EDM)是一个令人兴奋的研究领域,它可以从教育数据库中揭示有价值的知识。这些知识可以用于许多目的,包括识别需要特别关注的辍学或弱学生,发现可以提供终身机会的优秀学生。虽然以前的EDM研究使用了广泛的特征来预测学生的学业成绩(根据(i)达到的成绩或(ii)通过和不通过),但这些特征有时无法用于实际用途,因此,预测模型在就业中是不可行的。本研究使用数据挖掘(DM)算法来预测硕士生的学习成绩,使用非广泛的数据集,只包括在学习计划开始时容易收集的特征。为了进行这项研究,我们从德国曼海姆大学商业信息与数学学院收集了2010年至2018年700多名学生的记录。这些记录包括人口统计数据和入学后的特征,如学期成绩。实证结果表明:(1)预测学生学业成绩最显著的特征是学生每学期的成绩(重要性率在14 ~ 36%之间),其次是学生宿舍到大学的距离(重要性率在6 ~ 18%之间)和文化(重要性率在7 ~ 17%之间)。另一方面,性别、年龄、不及格课程的数量、每学期注册和未注册考试的数量对预测的影响较小。(ii)正如预期的那样,第二学期后进行的预测比第一学期后进行的预测更准确。不出所料,预测两个类别的模型比预测三个类别的模型产生更好的结果。(iv)随机森林分类器在所有预测模型中表现最好(准确率为0.77-0.94),使用过采样方法处理不平衡数据可以显著提高DM方法的性能。对于未来的工作,我们建议在其他主程序和更大的数据集上测试预测模型。此外,我们建议研究其他过采样方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Smart Learning Environments
Smart Learning Environments Social Sciences-Education
CiteScore
13.20
自引率
2.10%
发文量
29
审稿时长
19 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信