{"title":"Reinforcement learning with temporal and variable dependency-aware transformer for stock trading optimization","authors":"Yifan Li , Xu Dong , Zhuang Wu , Jing Gao , Tianqi Zhang , Lina Yu","doi":"10.1016/j.neunet.2025.107905","DOIUrl":null,"url":null,"abstract":"<div><div>Stock trading optimization aims to optimize portfolios in dynamic market environments, which plays a crucial role in practical financial decision-making. With the rise of Transformer in recent years, some researchers have combined Transformer with Reinforcement Learning (RL) to improve their ability to represent potential patterns in market data. However, existing methods mainly focus on capturing temporal dependencies, failing to effectively model the interactions among multiple variables, limiting sufficient decision-making information for policy learning in RL. To this end, this paper proposes a RL model that integrates a Temporal and Variable Dependency-aware Transformer to learn diverse dependency relationships in market data. Firstly, a short-term prediction module and a long-term prediction module are designed to explore potential dependencies in the market data with a short-term horizon and a long-term horizon, respectively. The core of both the short-term prediction module and the long-term prediction module is the Temporal and Variable Dependency-aware Transformer, which is implemented in two stages. Specifically, the first stage captures temporal relationships along the temporal dimension, and the second stage captures multivariate correlations across the variable dimension. Meanwhile, a relation representation module is proposed to further capture correlations of different stock assets within a market. Finally, a policy decision module is introduced to effectively fuse different representations from the preceding modules into a unified space, enabling RL to learn flexible policies with comprehensive decision-making information. The experimental results clearly demonstrate the superior performance of the proposed method, which achieves the highest Sharpe ratio of 1.48 and portfolio return of 2.65, outperforming state-of-the-art methods on three challenging datasets of CSI-300, S&P-100, and NASDAQ-100.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"192 ","pages":"Article 107905"},"PeriodicalIF":6.3000,"publicationDate":"2025-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neural Networks","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0893608025007865","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Stock trading optimization aims to optimize portfolios in dynamic market environments, which plays a crucial role in practical financial decision-making. With the rise of Transformer in recent years, some researchers have combined Transformer with Reinforcement Learning (RL) to improve their ability to represent potential patterns in market data. However, existing methods mainly focus on capturing temporal dependencies, failing to effectively model the interactions among multiple variables, limiting sufficient decision-making information for policy learning in RL. To this end, this paper proposes a RL model that integrates a Temporal and Variable Dependency-aware Transformer to learn diverse dependency relationships in market data. Firstly, a short-term prediction module and a long-term prediction module are designed to explore potential dependencies in the market data with a short-term horizon and a long-term horizon, respectively. The core of both the short-term prediction module and the long-term prediction module is the Temporal and Variable Dependency-aware Transformer, which is implemented in two stages. Specifically, the first stage captures temporal relationships along the temporal dimension, and the second stage captures multivariate correlations across the variable dimension. Meanwhile, a relation representation module is proposed to further capture correlations of different stock assets within a market. Finally, a policy decision module is introduced to effectively fuse different representations from the preceding modules into a unified space, enabling RL to learn flexible policies with comprehensive decision-making information. The experimental results clearly demonstrate the superior performance of the proposed method, which achieves the highest Sharpe ratio of 1.48 and portfolio return of 2.65, outperforming state-of-the-art methods on three challenging datasets of CSI-300, S&P-100, and NASDAQ-100.
期刊介绍:
Neural Networks is a platform that aims to foster an international community of scholars and practitioners interested in neural networks, deep learning, and other approaches to artificial intelligence and machine learning. Our journal invites submissions covering various aspects of neural networks research, from computational neuroscience and cognitive modeling to mathematical analyses and engineering applications. By providing a forum for interdisciplinary discussions between biology and technology, we aim to encourage the development of biologically-inspired artificial intelligence.