Using Ensemble Decision Tree Model to Predict Student Dropout in Computing Science

2019 IEEE Asia-Pacific Conference on Computer Science and Data Engineering (CSDE) Pub Date : 2019-12-01 DOI:10.1109/CSDE48274.2019.9162389

M. Naseem, K. Chaudhary, B. Sharma, Aman Goel Lal

{"title":"Using Ensemble Decision Tree Model to Predict Student Dropout in Computing Science","authors":"M. Naseem, K. Chaudhary, B. Sharma, Aman Goel Lal","doi":"10.1109/CSDE48274.2019.9162389","DOIUrl":null,"url":null,"abstract":"Science, Technology, Engineering and Mathematics (STEM) professionals play a key role in the development of an economy. STEM workers are critical thinkers as they contribute immensely by driving innovations. There is a high demand for professionals in the STEM fields but there is also a shortage of human resource in these areas. One way to reduce this problem is by identifying students who are at-risk of dropping out and then intervening with focused strategies that will ensure that these students remain in same the programme till graduation. Therefore, this research aims to use a data mining classification technique to identify students who are at-risk of dropping out from their Computing Science (CS) degree programmes. The Random Forest (RF) decision tree algorithm is used to learn patterns from historical data about first-year undergraduate CS students who are enrolled in a tertiary institute in the South Pacific. A number of factors are used which comprise of students demographic information, previous education background, financial information as well as data about students’ academic interaction. Feature selection is performed to determine which factors have greater influence in students’ decision in dropping out. Cross-validation techniques are used to ensure that the models are not over-fitted. Two models were built using a 5fold and 10-fold cross-validation and the results were compared using several measures of model performance. The results show that the factors corresponding to students’ academic performance in a first-year programming course had the greatest impact student attrition in CS.","PeriodicalId":238744,"journal":{"name":"2019 IEEE Asia-Pacific Conference on Computer Science and Data Engineering (CSDE)","volume":"11 5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE Asia-Pacific Conference on Computer Science and Data Engineering (CSDE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CSDE48274.2019.9162389","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 9

Abstract

Science, Technology, Engineering and Mathematics (STEM) professionals play a key role in the development of an economy. STEM workers are critical thinkers as they contribute immensely by driving innovations. There is a high demand for professionals in the STEM fields but there is also a shortage of human resource in these areas. One way to reduce this problem is by identifying students who are at-risk of dropping out and then intervening with focused strategies that will ensure that these students remain in same the programme till graduation. Therefore, this research aims to use a data mining classification technique to identify students who are at-risk of dropping out from their Computing Science (CS) degree programmes. The Random Forest (RF) decision tree algorithm is used to learn patterns from historical data about first-year undergraduate CS students who are enrolled in a tertiary institute in the South Pacific. A number of factors are used which comprise of students demographic information, previous education background, financial information as well as data about students’ academic interaction. Feature selection is performed to determine which factors have greater influence in students’ decision in dropping out. Cross-validation techniques are used to ensure that the models are not over-fitted. Two models were built using a 5fold and 10-fold cross-validation and the results were compared using several measures of model performance. The results show that the factors corresponding to students’ academic performance in a first-year programming course had the greatest impact student attrition in CS.

查看原文本刊更多论文

基于集成决策树模型的计算机科学专业学生退学预测

科学、技术、工程和数学(STEM)专业人员在经济发展中发挥着关键作用。STEM工作者是批判性思考者，因为他们通过推动创新做出了巨大贡献。对STEM领域的专业人才有很高的需求，但这些领域的人力资源也很短缺。减少这一问题的一种方法是确定有辍学风险的学生，然后采取有针对性的策略进行干预，以确保这些学生在毕业前继续学习相同的课程。因此，本研究旨在使用数据挖掘分类技术来识别有可能从计算机科学(CS)学位课程退学的学生。随机森林(RF)决策树算法用于从历史数据中学习模式，这些数据来自南太平洋一所大专院校的一年级计算机科学本科生。使用了许多因素，包括学生的人口统计信息，以前的教育背景，财务信息以及学生学术互动的数据。通过特征选择来确定哪些因素对学生的退学决策影响更大。交叉验证技术用于确保模型不会过度拟合。使用5倍和10倍交叉验证建立了两个模型，并使用几种模型性能指标对结果进行了比较。结果表明，与学生在第一年编程课程中的学习成绩相对应的因素对学生在CS中的流失率影响最大。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2019 IEEE Asia-Pacific Conference on Computer Science and Data Engineering (CSDE)

自引率

0.00%

发文量