Using Ensemble Decision Tree Model to Predict Student Dropout in Computing Science

M. Naseem, K. Chaudhary, B. Sharma, Aman Goel Lal
{"title":"Using Ensemble Decision Tree Model to Predict Student Dropout in Computing Science","authors":"M. Naseem, K. Chaudhary, B. Sharma, Aman Goel Lal","doi":"10.1109/CSDE48274.2019.9162389","DOIUrl":null,"url":null,"abstract":"Science, Technology, Engineering and Mathematics (STEM) professionals play a key role in the development of an economy. STEM workers are critical thinkers as they contribute immensely by driving innovations. There is a high demand for professionals in the STEM fields but there is also a shortage of human resource in these areas. One way to reduce this problem is by identifying students who are at-risk of dropping out and then intervening with focused strategies that will ensure that these students remain in same the programme till graduation. Therefore, this research aims to use a data mining classification technique to identify students who are at-risk of dropping out from their Computing Science (CS) degree programmes. The Random Forest (RF) decision tree algorithm is used to learn patterns from historical data about first-year undergraduate CS students who are enrolled in a tertiary institute in the South Pacific. A number of factors are used which comprise of students demographic information, previous education background, financial information as well as data about students’ academic interaction. Feature selection is performed to determine which factors have greater influence in students’ decision in dropping out. Cross-validation techniques are used to ensure that the models are not over-fitted. Two models were built using a 5fold and 10-fold cross-validation and the results were compared using several measures of model performance. The results show that the factors corresponding to students’ academic performance in a first-year programming course had the greatest impact student attrition in CS.","PeriodicalId":238744,"journal":{"name":"2019 IEEE Asia-Pacific Conference on Computer Science and Data Engineering (CSDE)","volume":"11 5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE Asia-Pacific Conference on Computer Science and Data Engineering (CSDE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CSDE48274.2019.9162389","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 9

Abstract

Science, Technology, Engineering and Mathematics (STEM) professionals play a key role in the development of an economy. STEM workers are critical thinkers as they contribute immensely by driving innovations. There is a high demand for professionals in the STEM fields but there is also a shortage of human resource in these areas. One way to reduce this problem is by identifying students who are at-risk of dropping out and then intervening with focused strategies that will ensure that these students remain in same the programme till graduation. Therefore, this research aims to use a data mining classification technique to identify students who are at-risk of dropping out from their Computing Science (CS) degree programmes. The Random Forest (RF) decision tree algorithm is used to learn patterns from historical data about first-year undergraduate CS students who are enrolled in a tertiary institute in the South Pacific. A number of factors are used which comprise of students demographic information, previous education background, financial information as well as data about students’ academic interaction. Feature selection is performed to determine which factors have greater influence in students’ decision in dropping out. Cross-validation techniques are used to ensure that the models are not over-fitted. Two models were built using a 5fold and 10-fold cross-validation and the results were compared using several measures of model performance. The results show that the factors corresponding to students’ academic performance in a first-year programming course had the greatest impact student attrition in CS.
基于集成决策树模型的计算机科学专业学生退学预测
科学、技术、工程和数学(STEM)专业人员在经济发展中发挥着关键作用。STEM工作者是批判性思考者,因为他们通过推动创新做出了巨大贡献。对STEM领域的专业人才有很高的需求,但这些领域的人力资源也很短缺。减少这一问题的一种方法是确定有辍学风险的学生,然后采取有针对性的策略进行干预,以确保这些学生在毕业前继续学习相同的课程。因此,本研究旨在使用数据挖掘分类技术来识别有可能从计算机科学(CS)学位课程退学的学生。随机森林(RF)决策树算法用于从历史数据中学习模式,这些数据来自南太平洋一所大专院校的一年级计算机科学本科生。使用了许多因素,包括学生的人口统计信息,以前的教育背景,财务信息以及学生学术互动的数据。通过特征选择来确定哪些因素对学生的退学决策影响更大。交叉验证技术用于确保模型不会过度拟合。使用5倍和10倍交叉验证建立了两个模型,并使用几种模型性能指标对结果进行了比较。结果表明,与学生在第一年编程课程中的学习成绩相对应的因素对学生在CS中的流失率影响最大。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信