使用机器学习方法识别重要变量,用于预测信息工程一年级学生的退学

F. Robles, Jacqueline Köhler, Karen Hinrechsen, V. Araya, Luciano Hidalgo, J. Jara
{"title":"使用机器学习方法识别重要变量,用于预测信息工程一年级学生的退学","authors":"F. Robles, Jacqueline Köhler, Karen Hinrechsen, V. Araya, Luciano Hidalgo, J. Jara","doi":"10.1109/SCCC51225.2020.9281280","DOIUrl":null,"url":null,"abstract":"Student dropout is a phenomenon that affects all higher education institutions in Chile, with costs for people, institutions and the State. The reported retention rate of first year students for all Chilean universities was of 75%. Despite the extensive research and the implementation of various models to identify dropout causes and risk groups, few of them have been carried out in the Chilean higher education context.Our work attempts to identify, using machine learning methods, the variables with highest predictive value for student dropout by the end of the first year of study, within a 6-year Informatics Engineering programme with a rather high dropout rate of 21.9% reported on 2018. In that regard, we use the data of 4 cohorts of students (2012-2016) enrolled at the programme, to feed a random forest feature selection process. We later build a decision tree using the identified relevant features, which we later test using data of the 2017-2018 cohorts of students.Despite the fact that the decision tree is over-fitted (97,21% training accuracy against 81.01% test accuracy), the process sheds light on the nature of the variables that determine whether or not a student remains at the end of their first year of study at the University. 6 of the identified factors are academic, and the remaining one is social-cultural.","PeriodicalId":117157,"journal":{"name":"2020 39th International Conference of the Chilean Computer Science Society (SCCC)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"Using machine learning methods to identify significant variables for the prediction of first-year Informatics Engineering students dropout\",\"authors\":\"F. Robles, Jacqueline Köhler, Karen Hinrechsen, V. Araya, Luciano Hidalgo, J. Jara\",\"doi\":\"10.1109/SCCC51225.2020.9281280\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Student dropout is a phenomenon that affects all higher education institutions in Chile, with costs for people, institutions and the State. The reported retention rate of first year students for all Chilean universities was of 75%. Despite the extensive research and the implementation of various models to identify dropout causes and risk groups, few of them have been carried out in the Chilean higher education context.Our work attempts to identify, using machine learning methods, the variables with highest predictive value for student dropout by the end of the first year of study, within a 6-year Informatics Engineering programme with a rather high dropout rate of 21.9% reported on 2018. In that regard, we use the data of 4 cohorts of students (2012-2016) enrolled at the programme, to feed a random forest feature selection process. We later build a decision tree using the identified relevant features, which we later test using data of the 2017-2018 cohorts of students.Despite the fact that the decision tree is over-fitted (97,21% training accuracy against 81.01% test accuracy), the process sheds light on the nature of the variables that determine whether or not a student remains at the end of their first year of study at the University. 6 of the identified factors are academic, and the remaining one is social-cultural.\",\"PeriodicalId\":117157,\"journal\":{\"name\":\"2020 39th International Conference of the Chilean Computer Science Society (SCCC)\",\"volume\":\"11 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-11-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 39th International Conference of the Chilean Computer Science Society (SCCC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SCCC51225.2020.9281280\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 39th International Conference of the Chilean Computer Science Society (SCCC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SCCC51225.2020.9281280","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6

摘要

学生辍学是影响智利所有高等教育机构的一种现象,给个人、机构和国家带来了成本。据报道,智利所有大学一年级学生的保留率为75%。尽管进行了广泛的研究并实施了各种模型来确定辍学原因和风险群体,但在智利高等教育背景下进行的研究很少。我们的工作试图使用机器学习方法,在6年的信息工程项目中,在2018年报告的辍学率高达21.9%的情况下,识别在第一年学习结束时学生退学预测价值最高的变量。在这方面,我们使用了在该计划中注册的4组学生(2012-2016)的数据,以提供随机森林特征选择过程。随后,我们使用识别出的相关特征构建决策树,然后使用2017-2018年学生队列的数据对其进行测试。尽管决策树是过度拟合的(97,21%的训练精度对81.01%的测试精度),但该过程揭示了决定学生是否在大学一年级学习结束时留下来的变量的性质。确定的因素中有6个是学术因素,剩下的一个是社会文化因素。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Using machine learning methods to identify significant variables for the prediction of first-year Informatics Engineering students dropout
Student dropout is a phenomenon that affects all higher education institutions in Chile, with costs for people, institutions and the State. The reported retention rate of first year students for all Chilean universities was of 75%. Despite the extensive research and the implementation of various models to identify dropout causes and risk groups, few of them have been carried out in the Chilean higher education context.Our work attempts to identify, using machine learning methods, the variables with highest predictive value for student dropout by the end of the first year of study, within a 6-year Informatics Engineering programme with a rather high dropout rate of 21.9% reported on 2018. In that regard, we use the data of 4 cohorts of students (2012-2016) enrolled at the programme, to feed a random forest feature selection process. We later build a decision tree using the identified relevant features, which we later test using data of the 2017-2018 cohorts of students.Despite the fact that the decision tree is over-fitted (97,21% training accuracy against 81.01% test accuracy), the process sheds light on the nature of the variables that determine whether or not a student remains at the end of their first year of study at the University. 6 of the identified factors are academic, and the remaining one is social-cultural.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信