{"title":"Pooling and winsorizing machine learning forecasts to predict stock returns with high-dimensional data","authors":"Erik Mekelburg , Jack Strauss","doi":"10.1016/j.jempfin.2024.101538","DOIUrl":null,"url":null,"abstract":"<div><p>We evaluate US market return predictability using a novel data set of several hundred ag- gregated firm-level characteristics. We apply LASSO, Elastic Net, Random Forest, Neural Net, Extreme Gradient Boosting, and Light Gradient Boosting Machine methods and find these models experience large prediction errors that lead to forecast failures. However, winsorizing and pooling machine learning model forecasts provides consistent out-of-sample predictability. To assess robustness, we apply machine learning methods to high-dimensional data for Canada, China, Germany and the UK as well as the Goyal–Welch data. All machine learning models we consider, except for the ensemble pooled methods, fail to significantly predict returns across our samples, highlighting the importance of pooling, evaluating additional economies, and the fragility of individual machine learning methods. Our results shed light on the sparsity versus density debate as the degree of sparsity and variable importance evolves over time.</p></div>","PeriodicalId":15704,"journal":{"name":"Journal of Empirical Finance","volume":"79 ","pages":"Article 101538"},"PeriodicalIF":2.1000,"publicationDate":"2024-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0927539824000732/pdfft?md5=a9db7e6e4ae641bec07f185220532c35&pid=1-s2.0-S0927539824000732-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Empirical Finance","FirstCategoryId":"96","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0927539824000732","RegionNum":2,"RegionCategory":"经济学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BUSINESS, FINANCE","Score":null,"Total":0}
引用次数: 0
Abstract
We evaluate US market return predictability using a novel data set of several hundred ag- gregated firm-level characteristics. We apply LASSO, Elastic Net, Random Forest, Neural Net, Extreme Gradient Boosting, and Light Gradient Boosting Machine methods and find these models experience large prediction errors that lead to forecast failures. However, winsorizing and pooling machine learning model forecasts provides consistent out-of-sample predictability. To assess robustness, we apply machine learning methods to high-dimensional data for Canada, China, Germany and the UK as well as the Goyal–Welch data. All machine learning models we consider, except for the ensemble pooled methods, fail to significantly predict returns across our samples, highlighting the importance of pooling, evaluating additional economies, and the fragility of individual machine learning methods. Our results shed light on the sparsity versus density debate as the degree of sparsity and variable importance evolves over time.
期刊介绍:
The Journal of Empirical Finance is a financial economics journal whose aim is to publish high quality articles in empirical finance. Empirical finance is interpreted broadly to include any type of empirical work in financial economics, financial econometrics, and also theoretical work with clear empirical implications, even when there is no empirical analysis. The Journal welcomes articles in all fields of finance, such as asset pricing, corporate finance, financial econometrics, banking, international finance, microstructure, behavioural finance, etc. The Editorial Team is willing to take risks on innovative research, controversial papers, and unusual approaches. We are also particularly interested in work produced by young scholars. The composition of the editorial board reflects such goals.