F. Robles, Jacqueline Köhler, Karen Hinrechsen, V. Araya, Luciano Hidalgo, J. Jara
{"title":"使用机器学习方法识别重要变量,用于预测信息工程一年级学生的退学","authors":"F. Robles, Jacqueline Köhler, Karen Hinrechsen, V. Araya, Luciano Hidalgo, J. Jara","doi":"10.1109/SCCC51225.2020.9281280","DOIUrl":null,"url":null,"abstract":"Student dropout is a phenomenon that affects all higher education institutions in Chile, with costs for people, institutions and the State. The reported retention rate of first year students for all Chilean universities was of 75%. Despite the extensive research and the implementation of various models to identify dropout causes and risk groups, few of them have been carried out in the Chilean higher education context.Our work attempts to identify, using machine learning methods, the variables with highest predictive value for student dropout by the end of the first year of study, within a 6-year Informatics Engineering programme with a rather high dropout rate of 21.9% reported on 2018. In that regard, we use the data of 4 cohorts of students (2012-2016) enrolled at the programme, to feed a random forest feature selection process. We later build a decision tree using the identified relevant features, which we later test using data of the 2017-2018 cohorts of students.Despite the fact that the decision tree is over-fitted (97,21% training accuracy against 81.01% test accuracy), the process sheds light on the nature of the variables that determine whether or not a student remains at the end of their first year of study at the University. 6 of the identified factors are academic, and the remaining one is social-cultural.","PeriodicalId":117157,"journal":{"name":"2020 39th International Conference of the Chilean Computer Science Society (SCCC)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"Using machine learning methods to identify significant variables for the prediction of first-year Informatics Engineering students dropout\",\"authors\":\"F. Robles, Jacqueline Köhler, Karen Hinrechsen, V. Araya, Luciano Hidalgo, J. Jara\",\"doi\":\"10.1109/SCCC51225.2020.9281280\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Student dropout is a phenomenon that affects all higher education institutions in Chile, with costs for people, institutions and the State. The reported retention rate of first year students for all Chilean universities was of 75%. Despite the extensive research and the implementation of various models to identify dropout causes and risk groups, few of them have been carried out in the Chilean higher education context.Our work attempts to identify, using machine learning methods, the variables with highest predictive value for student dropout by the end of the first year of study, within a 6-year Informatics Engineering programme with a rather high dropout rate of 21.9% reported on 2018. In that regard, we use the data of 4 cohorts of students (2012-2016) enrolled at the programme, to feed a random forest feature selection process. We later build a decision tree using the identified relevant features, which we later test using data of the 2017-2018 cohorts of students.Despite the fact that the decision tree is over-fitted (97,21% training accuracy against 81.01% test accuracy), the process sheds light on the nature of the variables that determine whether or not a student remains at the end of their first year of study at the University. 6 of the identified factors are academic, and the remaining one is social-cultural.\",\"PeriodicalId\":117157,\"journal\":{\"name\":\"2020 39th International Conference of the Chilean Computer Science Society (SCCC)\",\"volume\":\"11 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-11-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 39th International Conference of the Chilean Computer Science Society (SCCC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SCCC51225.2020.9281280\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 39th International Conference of the Chilean Computer Science Society (SCCC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SCCC51225.2020.9281280","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Using machine learning methods to identify significant variables for the prediction of first-year Informatics Engineering students dropout
Student dropout is a phenomenon that affects all higher education institutions in Chile, with costs for people, institutions and the State. The reported retention rate of first year students for all Chilean universities was of 75%. Despite the extensive research and the implementation of various models to identify dropout causes and risk groups, few of them have been carried out in the Chilean higher education context.Our work attempts to identify, using machine learning methods, the variables with highest predictive value for student dropout by the end of the first year of study, within a 6-year Informatics Engineering programme with a rather high dropout rate of 21.9% reported on 2018. In that regard, we use the data of 4 cohorts of students (2012-2016) enrolled at the programme, to feed a random forest feature selection process. We later build a decision tree using the identified relevant features, which we later test using data of the 2017-2018 cohorts of students.Despite the fact that the decision tree is over-fitted (97,21% training accuracy against 81.01% test accuracy), the process sheds light on the nature of the variables that determine whether or not a student remains at the end of their first year of study at the University. 6 of the identified factors are academic, and the remaining one is social-cultural.