{"title":"利用机器学习预测标准普尔 500 指数股票的相对回报率","authors":"Htet Htet Htun, Michael Biehl, Nicolai Petkov","doi":"10.1186/s40854-024-00644-0","DOIUrl":null,"url":null,"abstract":"Forecasting changes in stock prices is extremely challenging given that numerous factors cause these prices to fluctuate. The random walk hypothesis and efficient market hypothesis essentially state that it is not possible to systematically, reliably predict future stock prices or forecast changes in the stock market overall. Nonetheless, machine learning (ML) techniques that use historical data have been applied to make such predictions. Previous studies focused on a small number of stocks and claimed success with limited statistical confidence. In this study, we construct feature vectors composed of multiple previous relative returns and apply the random forest (RF), support vector machine (SVM), and long short-term memory (LSTM) ML methods as classifiers to predict whether a stock can return 2% more than its index in the following 10 days. We apply this approach to all S&P 500 companies for the period 2017–2022. We assess performance using accuracy, precision, and recall and compare our results with a random choice strategy. We observe that the LSTM classifier outperforms RF and SVM, and the data-driven ML methods outperform the random choice classifier (p = 8.46e−17 for accuracy of LSTM). Thus, we demonstrate that the probability that the random walk and efficient market hypotheses hold in the considered context is negligibly small.","PeriodicalId":37175,"journal":{"name":"Financial Innovation","volume":"53 1","pages":""},"PeriodicalIF":6.9000,"publicationDate":"2024-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Forecasting relative returns for S&P 500 stocks using machine learning\",\"authors\":\"Htet Htet Htun, Michael Biehl, Nicolai Petkov\",\"doi\":\"10.1186/s40854-024-00644-0\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Forecasting changes in stock prices is extremely challenging given that numerous factors cause these prices to fluctuate. The random walk hypothesis and efficient market hypothesis essentially state that it is not possible to systematically, reliably predict future stock prices or forecast changes in the stock market overall. Nonetheless, machine learning (ML) techniques that use historical data have been applied to make such predictions. Previous studies focused on a small number of stocks and claimed success with limited statistical confidence. In this study, we construct feature vectors composed of multiple previous relative returns and apply the random forest (RF), support vector machine (SVM), and long short-term memory (LSTM) ML methods as classifiers to predict whether a stock can return 2% more than its index in the following 10 days. We apply this approach to all S&P 500 companies for the period 2017–2022. We assess performance using accuracy, precision, and recall and compare our results with a random choice strategy. We observe that the LSTM classifier outperforms RF and SVM, and the data-driven ML methods outperform the random choice classifier (p = 8.46e−17 for accuracy of LSTM). Thus, we demonstrate that the probability that the random walk and efficient market hypotheses hold in the considered context is negligibly small.\",\"PeriodicalId\":37175,\"journal\":{\"name\":\"Financial Innovation\",\"volume\":\"53 1\",\"pages\":\"\"},\"PeriodicalIF\":6.9000,\"publicationDate\":\"2024-04-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Financial Innovation\",\"FirstCategoryId\":\"96\",\"ListUrlMain\":\"https://doi.org/10.1186/s40854-024-00644-0\",\"RegionNum\":1,\"RegionCategory\":\"经济学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"BUSINESS, FINANCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Financial Innovation","FirstCategoryId":"96","ListUrlMain":"https://doi.org/10.1186/s40854-024-00644-0","RegionNum":1,"RegionCategory":"经济学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BUSINESS, FINANCE","Score":null,"Total":0}
引用次数: 0
摘要
由于导致股票价格波动的因素众多,因此预测股票价格的变化极具挑战性。随机漫步假说和有效市场假说的本质是,不可能系统、可靠地预测未来股票价格或预测股市的整体变化。然而,使用历史数据的机器学习(ML)技术已被用于进行此类预测。以前的研究主要集中在少数股票上,并声称取得了成功,但统计置信度有限。在本研究中,我们构建了由之前多个相对回报率组成的特征向量,并应用随机森林(RF)、支持向量机(SVM)和长短期记忆(LSTM)ML 方法作为分类器,来预测一只股票在接下来的 10 天内的回报率是否能比其指数高出 2%。我们将这种方法应用于 2017-2022 年期间的所有标准普尔 500 指数公司。我们使用准确率、精确度和召回率评估性能,并将结果与随机选择策略进行比较。我们发现,LSTM 分类器的表现优于 RF 和 SVM,而数据驱动的 ML 方法的表现优于随机选择分类器(LSTM 的准确率 p = 8.46e-17)。因此,我们证明,在所考虑的情况下,随机漫步和有效市场假说成立的概率小到可以忽略不计。
Forecasting relative returns for S&P 500 stocks using machine learning
Forecasting changes in stock prices is extremely challenging given that numerous factors cause these prices to fluctuate. The random walk hypothesis and efficient market hypothesis essentially state that it is not possible to systematically, reliably predict future stock prices or forecast changes in the stock market overall. Nonetheless, machine learning (ML) techniques that use historical data have been applied to make such predictions. Previous studies focused on a small number of stocks and claimed success with limited statistical confidence. In this study, we construct feature vectors composed of multiple previous relative returns and apply the random forest (RF), support vector machine (SVM), and long short-term memory (LSTM) ML methods as classifiers to predict whether a stock can return 2% more than its index in the following 10 days. We apply this approach to all S&P 500 companies for the period 2017–2022. We assess performance using accuracy, precision, and recall and compare our results with a random choice strategy. We observe that the LSTM classifier outperforms RF and SVM, and the data-driven ML methods outperform the random choice classifier (p = 8.46e−17 for accuracy of LSTM). Thus, we demonstrate that the probability that the random walk and efficient market hypotheses hold in the considered context is negligibly small.
期刊介绍:
Financial Innovation (FIN), a Springer OA journal sponsored by Southwestern University of Finance and Economics, serves as a global academic platform for sharing research findings in all aspects of financial innovation during the electronic business era. It facilitates interactions among researchers, policymakers, and practitioners, focusing on new financial instruments, technologies, markets, and institutions. Emphasizing emerging financial products enabled by disruptive technologies, FIN publishes high-quality academic and practical papers. The journal is peer-reviewed, indexed in SSCI, Scopus, Google Scholar, CNKI, CQVIP, and more.