F. Pereira, Elaine H. T. Oliveira, David Fernandes, A. Cristea
{"title":"Early Performance Prediction for CS1 Course Students using a Combination of Machine Learning and an Evolutionary Algorithm","authors":"F. Pereira, Elaine H. T. Oliveira, David Fernandes, A. Cristea","doi":"10.1109/ICALT.2019.00066","DOIUrl":null,"url":null,"abstract":"Many researchers have started extracting student behaviour by cleaning data collected from web environments and using it as features in machine learning (ML) models. Using log data collected from an online judge, we have compiled a set of successful features correlated with the student grade and applying them on a database representing 486 CS1 students. We used this set of features in ML pipelines which were optimised, featuring a combination of an automated approach with an evolutionary algorithm and hyperparameter-tuning with random search. As a result, we achieved an accuracy of 75.55%, using data from only the first two weeks to predict the student final grades. We show how our pipeline outperforms state-of-the-art work on similar scenarios.","PeriodicalId":356549,"journal":{"name":"2019 IEEE 19th International Conference on Advanced Learning Technologies (ICALT)","volume":"52 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"29","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE 19th International Conference on Advanced Learning Technologies (ICALT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICALT.2019.00066","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 29
Abstract
Many researchers have started extracting student behaviour by cleaning data collected from web environments and using it as features in machine learning (ML) models. Using log data collected from an online judge, we have compiled a set of successful features correlated with the student grade and applying them on a database representing 486 CS1 students. We used this set of features in ML pipelines which were optimised, featuring a combination of an automated approach with an evolutionary algorithm and hyperparameter-tuning with random search. As a result, we achieved an accuracy of 75.55%, using data from only the first two weeks to predict the student final grades. We show how our pipeline outperforms state-of-the-art work on similar scenarios.