Lingjun He, R. Levine, J. Fan, Joshua Beemer, Jeanne Stronach
{"title":"随机森林在机构研究中替代回归的预测分析方法。","authors":"Lingjun He, R. Levine, J. Fan, Joshua Beemer, Jeanne Stronach","doi":"10.7275/1WPR-M024","DOIUrl":null,"url":null,"abstract":"In institutional research, modern data mining approaches are seldom considered to address predictive analytics problems. The goal of this paper is to highlight the advantages of tree-based machine learning algorithms over classic (logistic) regression methods for data-informed decision making in higher education problems, and stress the success of random forest in circumstances where the regression assumptions are often violated in big data applications. Random forest is a model averaging procedure where each tree is constructed based on a bootstrap sample of the data set. In particular, we emphasize the ease of application, low computational cost, high predictive accuracy, flexibility, and interpretability of random forest machinery. Our overall recommendation is that institutional researchers look beyond classical regression and single decision tree analytics tools, and consider random forest as the predominant method for prediction tasks. The proposed points of view are detailed and illustrated through a simulation experiment and analyses of data from real institutional research projects.","PeriodicalId":20361,"journal":{"name":"Practical Assessment, Research and Evaluation","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2018-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"36","resultStr":"{\"title\":\"Random Forest as a Predictive Analytics Alternative to Regression in Institutional Research.\",\"authors\":\"Lingjun He, R. Levine, J. Fan, Joshua Beemer, Jeanne Stronach\",\"doi\":\"10.7275/1WPR-M024\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In institutional research, modern data mining approaches are seldom considered to address predictive analytics problems. The goal of this paper is to highlight the advantages of tree-based machine learning algorithms over classic (logistic) regression methods for data-informed decision making in higher education problems, and stress the success of random forest in circumstances where the regression assumptions are often violated in big data applications. Random forest is a model averaging procedure where each tree is constructed based on a bootstrap sample of the data set. In particular, we emphasize the ease of application, low computational cost, high predictive accuracy, flexibility, and interpretability of random forest machinery. Our overall recommendation is that institutional researchers look beyond classical regression and single decision tree analytics tools, and consider random forest as the predominant method for prediction tasks. The proposed points of view are detailed and illustrated through a simulation experiment and analyses of data from real institutional research projects.\",\"PeriodicalId\":20361,\"journal\":{\"name\":\"Practical Assessment, Research and Evaluation\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"36\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Practical Assessment, Research and Evaluation\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.7275/1WPR-M024\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"Social Sciences\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Practical Assessment, Research and Evaluation","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.7275/1WPR-M024","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"Social Sciences","Score":null,"Total":0}
Random Forest as a Predictive Analytics Alternative to Regression in Institutional Research.
In institutional research, modern data mining approaches are seldom considered to address predictive analytics problems. The goal of this paper is to highlight the advantages of tree-based machine learning algorithms over classic (logistic) regression methods for data-informed decision making in higher education problems, and stress the success of random forest in circumstances where the regression assumptions are often violated in big data applications. Random forest is a model averaging procedure where each tree is constructed based on a bootstrap sample of the data set. In particular, we emphasize the ease of application, low computational cost, high predictive accuracy, flexibility, and interpretability of random forest machinery. Our overall recommendation is that institutional researchers look beyond classical regression and single decision tree analytics tools, and consider random forest as the predominant method for prediction tasks. The proposed points of view are detailed and illustrated through a simulation experiment and analyses of data from real institutional research projects.