GPT情绪分析在股票收益预测中的预估偏差

Paul Glasserman, Caden Lin
{"title":"GPT情绪分析在股票收益预测中的预估偏差","authors":"Paul Glasserman, Caden Lin","doi":"arxiv-2309.17322","DOIUrl":null,"url":null,"abstract":"Large language models (LLMs), including ChatGPT, can extract profitable\ntrading signals from the sentiment in news text. However, backtesting such\nstrategies poses a challenge because LLMs are trained on many years of data,\nand backtesting produces biased results if the training and backtesting periods\noverlap. This bias can take two forms: a look-ahead bias, in which the LLM may\nhave specific knowledge of the stock returns that followed a news article, and\na distraction effect, in which general knowledge of the companies named\ninterferes with the measurement of a text's sentiment. We investigate these\nsources of bias through trading strategies driven by the sentiment of financial\nnews headlines. We compare trading performance based on the original headlines\nwith de-biased strategies in which we remove the relevant company's identifiers\nfrom the text. In-sample (within the LLM training window), we find,\nsurprisingly, that the anonymized headlines outperform, indicating that the\ndistraction effect has a greater impact than look-ahead bias. This tendency is\nparticularly strong for larger companies--companies about which we expect an\nLLM to have greater general knowledge. Out-of-sample, look-ahead bias is not a\nconcern but distraction remains possible. Our proposed anonymization procedure\nis therefore potentially useful in out-of-sample implementation, as well as for\nde-biased backtesting.","PeriodicalId":501372,"journal":{"name":"arXiv - QuantFin - General Finance","volume":"6 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Assessing Look-Ahead Bias in Stock Return Predictions Generated By GPT Sentiment Analysis\",\"authors\":\"Paul Glasserman, Caden Lin\",\"doi\":\"arxiv-2309.17322\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Large language models (LLMs), including ChatGPT, can extract profitable\\ntrading signals from the sentiment in news text. However, backtesting such\\nstrategies poses a challenge because LLMs are trained on many years of data,\\nand backtesting produces biased results if the training and backtesting periods\\noverlap. This bias can take two forms: a look-ahead bias, in which the LLM may\\nhave specific knowledge of the stock returns that followed a news article, and\\na distraction effect, in which general knowledge of the companies named\\ninterferes with the measurement of a text's sentiment. We investigate these\\nsources of bias through trading strategies driven by the sentiment of financial\\nnews headlines. We compare trading performance based on the original headlines\\nwith de-biased strategies in which we remove the relevant company's identifiers\\nfrom the text. In-sample (within the LLM training window), we find,\\nsurprisingly, that the anonymized headlines outperform, indicating that the\\ndistraction effect has a greater impact than look-ahead bias. This tendency is\\nparticularly strong for larger companies--companies about which we expect an\\nLLM to have greater general knowledge. Out-of-sample, look-ahead bias is not a\\nconcern but distraction remains possible. Our proposed anonymization procedure\\nis therefore potentially useful in out-of-sample implementation, as well as for\\nde-biased backtesting.\",\"PeriodicalId\":501372,\"journal\":{\"name\":\"arXiv - QuantFin - General Finance\",\"volume\":\"6 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-09-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - QuantFin - General Finance\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2309.17322\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - QuantFin - General Finance","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2309.17322","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

包括ChatGPT在内的大型语言模型(llm)可以从新闻文本的情绪中提取有利可图的交易信号。然而,回测这种策略带来了挑战,因为法学硕士是在多年的数据上训练的,如果训练和回测周期重叠,回测会产生有偏差的结果。这种偏见可以有两种形式:一种是前瞻性偏见,法学硕士可能对一篇新闻文章之后的股票回报有特定的了解;另一种是分心效应,对所提到公司的一般了解会干扰对文章情绪的衡量。我们通过金融新闻标题情绪驱动的交易策略来调查这些偏见的来源。我们将基于原始标题的交易表现与去偏见策略进行比较,其中我们从文本中删除了相关公司的标识符。在样本内(在法学硕士训练窗口内),我们发现,令人惊讶的是,匿名标题表现得更好,这表明分心效应比前视偏见有更大的影响。这种趋势在大公司中尤为明显——我们希望llm对这些公司有更广泛的了解。样本外、前视偏差不是问题,但分心仍然是可能的。因此,我们提出的匿名化程序在样本外实现以及forde-biased回溯测试中可能是有用的。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Assessing Look-Ahead Bias in Stock Return Predictions Generated By GPT Sentiment Analysis
Large language models (LLMs), including ChatGPT, can extract profitable trading signals from the sentiment in news text. However, backtesting such strategies poses a challenge because LLMs are trained on many years of data, and backtesting produces biased results if the training and backtesting periods overlap. This bias can take two forms: a look-ahead bias, in which the LLM may have specific knowledge of the stock returns that followed a news article, and a distraction effect, in which general knowledge of the companies named interferes with the measurement of a text's sentiment. We investigate these sources of bias through trading strategies driven by the sentiment of financial news headlines. We compare trading performance based on the original headlines with de-biased strategies in which we remove the relevant company's identifiers from the text. In-sample (within the LLM training window), we find, surprisingly, that the anonymized headlines outperform, indicating that the distraction effect has a greater impact than look-ahead bias. This tendency is particularly strong for larger companies--companies about which we expect an LLM to have greater general knowledge. Out-of-sample, look-ahead bias is not a concern but distraction remains possible. Our proposed anonymization procedure is therefore potentially useful in out-of-sample implementation, as well as for de-biased backtesting.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信