{"title":"教机器人走路,也教它们做交易--利用知情数据和 LLM 进行制度自适应执行","authors":"Raeid Saqur","doi":"arxiv-2406.15508","DOIUrl":null,"url":null,"abstract":"Machine learning techniques applied to the problem of financial market\nforecasting struggle with dynamic regime switching, or underlying correlation\nand covariance shifts in true (hidden) market variables. Drawing inspiration\nfrom the success of reinforcement learning in robotics, particularly in agile\nlocomotion adaptation of quadruped robots to unseen terrains, we introduce an\ninnovative approach that leverages world knowledge of pretrained LLMs (aka.\n'privileged information' in robotics) and dynamically adapts them using\nintrinsic, natural market rewards using LLM alignment technique we dub as\n\"Reinforcement Learning from Market Feedback\" (**RLMF**). Strong empirical\nresults demonstrate the efficacy of our method in adapting to regime shifts in\nfinancial markets, a challenge that has long plagued predictive models in this\ndomain. The proposed algorithmic framework outperforms best-performing SOTA LLM\nmodels on the existing (FLARE) benchmark stock-movement (SM) tasks by more than\n15\\% improved accuracy. On the recently proposed NIFTY SM task, our adaptive\npolicy outperforms the SOTA best performing trillion parameter models like\nGPT-4. The paper details the dual-phase, teacher-student architecture and\nimplementation of our model, the empirical results obtained, and an analysis of\nthe role of language embeddings in terms of Information Gain.","PeriodicalId":501294,"journal":{"name":"arXiv - QuantFin - Computational Finance","volume":"2012 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"What Teaches Robots to Walk, Teaches Them to Trade too -- Regime Adaptive Execution using Informed Data and LLMs\",\"authors\":\"Raeid Saqur\",\"doi\":\"arxiv-2406.15508\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Machine learning techniques applied to the problem of financial market\\nforecasting struggle with dynamic regime switching, or underlying correlation\\nand covariance shifts in true (hidden) market variables. Drawing inspiration\\nfrom the success of reinforcement learning in robotics, particularly in agile\\nlocomotion adaptation of quadruped robots to unseen terrains, we introduce an\\ninnovative approach that leverages world knowledge of pretrained LLMs (aka.\\n'privileged information' in robotics) and dynamically adapts them using\\nintrinsic, natural market rewards using LLM alignment technique we dub as\\n\\\"Reinforcement Learning from Market Feedback\\\" (**RLMF**). Strong empirical\\nresults demonstrate the efficacy of our method in adapting to regime shifts in\\nfinancial markets, a challenge that has long plagued predictive models in this\\ndomain. The proposed algorithmic framework outperforms best-performing SOTA LLM\\nmodels on the existing (FLARE) benchmark stock-movement (SM) tasks by more than\\n15\\\\% improved accuracy. On the recently proposed NIFTY SM task, our adaptive\\npolicy outperforms the SOTA best performing trillion parameter models like\\nGPT-4. The paper details the dual-phase, teacher-student architecture and\\nimplementation of our model, the empirical results obtained, and an analysis of\\nthe role of language embeddings in terms of Information Gain.\",\"PeriodicalId\":501294,\"journal\":{\"name\":\"arXiv - QuantFin - Computational Finance\",\"volume\":\"2012 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-06-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - QuantFin - Computational Finance\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2406.15508\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - QuantFin - Computational Finance","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2406.15508","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
摘要
应用于金融市场预测问题的机器学习技术在动态体制转换或真实(隐藏)市场变量的潜在相关性和协方差变化方面困难重重。我们从机器人学中强化学习的成功,特别是四足机器人对未知地形的敏捷运动适应中汲取灵感,引入了一种创新方法,即利用预训练 LLM 的世界知识(又称机器人学中的 "特权信息"),并使用我们称之为 "市场反馈强化学习"(**RLMF**)的 LLM 对齐技术,利用内在的自然市场奖励对它们进行动态调整。强大的实证结果证明了我们的方法在适应金融市场制度转变方面的功效,而这正是长期困扰该领域预测模型的难题。在现有的(FLARE)基准股票移动(SM)任务上,所提出的算法框架优于表现最好的 SOTA LLM 模型,准确率提高了 15% 以上。在最近提出的 NIFTY SM 任务中,我们的自适应策略优于 SOTA 性能最好的万亿参数模型,如 GPT-4。论文详细介绍了我们的模型的师生双阶段架构和实施、获得的实证结果以及对语言嵌入在信息增益方面的作用的分析。
What Teaches Robots to Walk, Teaches Them to Trade too -- Regime Adaptive Execution using Informed Data and LLMs
Machine learning techniques applied to the problem of financial market
forecasting struggle with dynamic regime switching, or underlying correlation
and covariance shifts in true (hidden) market variables. Drawing inspiration
from the success of reinforcement learning in robotics, particularly in agile
locomotion adaptation of quadruped robots to unseen terrains, we introduce an
innovative approach that leverages world knowledge of pretrained LLMs (aka.
'privileged information' in robotics) and dynamically adapts them using
intrinsic, natural market rewards using LLM alignment technique we dub as
"Reinforcement Learning from Market Feedback" (**RLMF**). Strong empirical
results demonstrate the efficacy of our method in adapting to regime shifts in
financial markets, a challenge that has long plagued predictive models in this
domain. The proposed algorithmic framework outperforms best-performing SOTA LLM
models on the existing (FLARE) benchmark stock-movement (SM) tasks by more than
15\% improved accuracy. On the recently proposed NIFTY SM task, our adaptive
policy outperforms the SOTA best performing trillion parameter models like
GPT-4. The paper details the dual-phase, teacher-student architecture and
implementation of our model, the empirical results obtained, and an analysis of
the role of language embeddings in terms of Information Gain.