Pooling and winsorizing machine learning forecasts to predict stock returns with high-dimensional data

IF 2.1 2区 经济学 Q2 BUSINESS, FINANCE
Erik Mekelburg , Jack Strauss
{"title":"Pooling and winsorizing machine learning forecasts to predict stock returns with high-dimensional data","authors":"Erik Mekelburg ,&nbsp;Jack Strauss","doi":"10.1016/j.jempfin.2024.101538","DOIUrl":null,"url":null,"abstract":"<div><p>We evaluate US market return predictability using a novel data set of several hundred ag- gregated firm-level characteristics. We apply LASSO, Elastic Net, Random Forest, Neural Net, Extreme Gradient Boosting, and Light Gradient Boosting Machine methods and find these models experience large prediction errors that lead to forecast failures. However, winsorizing and pooling machine learning model forecasts provides consistent out-of-sample predictability. To assess robustness, we apply machine learning methods to high-dimensional data for Canada, China, Germany and the UK as well as the Goyal–Welch data. All machine learning models we consider, except for the ensemble pooled methods, fail to significantly predict returns across our samples, highlighting the importance of pooling, evaluating additional economies, and the fragility of individual machine learning methods. Our results shed light on the sparsity versus density debate as the degree of sparsity and variable importance evolves over time.</p></div>","PeriodicalId":15704,"journal":{"name":"Journal of Empirical Finance","volume":"79 ","pages":"Article 101538"},"PeriodicalIF":2.1000,"publicationDate":"2024-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0927539824000732/pdfft?md5=a9db7e6e4ae641bec07f185220532c35&pid=1-s2.0-S0927539824000732-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Empirical Finance","FirstCategoryId":"96","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0927539824000732","RegionNum":2,"RegionCategory":"经济学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BUSINESS, FINANCE","Score":null,"Total":0}
引用次数: 0

Abstract

We evaluate US market return predictability using a novel data set of several hundred ag- gregated firm-level characteristics. We apply LASSO, Elastic Net, Random Forest, Neural Net, Extreme Gradient Boosting, and Light Gradient Boosting Machine methods and find these models experience large prediction errors that lead to forecast failures. However, winsorizing and pooling machine learning model forecasts provides consistent out-of-sample predictability. To assess robustness, we apply machine learning methods to high-dimensional data for Canada, China, Germany and the UK as well as the Goyal–Welch data. All machine learning models we consider, except for the ensemble pooled methods, fail to significantly predict returns across our samples, highlighting the importance of pooling, evaluating additional economies, and the fragility of individual machine learning methods. Our results shed light on the sparsity versus density debate as the degree of sparsity and variable importance evolves over time.

利用高维数据对机器学习预测进行汇集和胜选,以预测股票回报率
我们使用一个包含数百个公司级特征的新数据集来评估美国市场回报率的可预测性。我们应用了 LASSO、Elastic Net、Random Forest、Neural Net、Extreme Gradient Boosting 和 Light Gradient Boosting Machine 方法,发现这些模型的预测误差较大,导致预测失败。然而,对机器学习模型预测进行胜选和池化可提供一致的样本外预测能力。为了评估稳健性,我们将机器学习方法应用于加拿大、中国、德国和英国的高维数据以及 Goyal-Welch 数据。我们所考虑的所有机器学习模型,除了集合汇集方法外,都无法显著预测整个样本的回报率,这凸显了汇集、评估其他经济体的重要性,以及单个机器学习方法的脆弱性。随着稀疏程度和变量重要性的不断变化,我们的结果揭示了稀疏性与密度之间的争论。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
3.40
自引率
3.80%
发文量
59
期刊介绍: The Journal of Empirical Finance is a financial economics journal whose aim is to publish high quality articles in empirical finance. Empirical finance is interpreted broadly to include any type of empirical work in financial economics, financial econometrics, and also theoretical work with clear empirical implications, even when there is no empirical analysis. The Journal welcomes articles in all fields of finance, such as asset pricing, corporate finance, financial econometrics, banking, international finance, microstructure, behavioural finance, etc. The Editorial Team is willing to take risks on innovative research, controversial papers, and unusual approaches. We are also particularly interested in work produced by young scholars. The composition of the editorial board reflects such goals.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信