Stephen Hutt, Margo Gardener, Donald Kamentz, A. Duckworth, S. D’Mello
{"title":"Prospectively predicting 4-year college graduation from student applications","authors":"Stephen Hutt, Margo Gardener, Donald Kamentz, A. Duckworth, S. D’Mello","doi":"10.1145/3170358.3170395","DOIUrl":null,"url":null,"abstract":"We leverage a unique national dataset of 41,359 college applications to prospectively predict 4-year bachelor's graduation in a generalizable manner. Our features include sociodemographics, institutional graduation rates, academic achievement, standardized test scores, engagement in extracurricular activities, work experiences, and ratings by teachers and high-school guidance counselors. A random forest classifier successfully predicted 4-year graduation for 71.4% of the students (base rate = 44%) using all 166 of the aforementioned features and a split-half validation method. A stochastic hill-climbing feature selection procedure effectively maintained the same classification accuracy, but with a minimal set of 37 features, consisting of an approximately equal representation of sociodemographics, cognitive, and noncognitive factors. We advocate against using these results for admissions decisions, instead contemplating how they might be used to provide parents and educators with actionable information to guide students towards college success.","PeriodicalId":437369,"journal":{"name":"Proceedings of the 8th International Conference on Learning Analytics and Knowledge","volume":"4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"23","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 8th International Conference on Learning Analytics and Knowledge","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3170358.3170395","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 23
Abstract
We leverage a unique national dataset of 41,359 college applications to prospectively predict 4-year bachelor's graduation in a generalizable manner. Our features include sociodemographics, institutional graduation rates, academic achievement, standardized test scores, engagement in extracurricular activities, work experiences, and ratings by teachers and high-school guidance counselors. A random forest classifier successfully predicted 4-year graduation for 71.4% of the students (base rate = 44%) using all 166 of the aforementioned features and a split-half validation method. A stochastic hill-climbing feature selection procedure effectively maintained the same classification accuracy, but with a minimal set of 37 features, consisting of an approximately equal representation of sociodemographics, cognitive, and noncognitive factors. We advocate against using these results for admissions decisions, instead contemplating how they might be used to provide parents and educators with actionable information to guide students towards college success.