PortfolioZero: A stock portfolio model based on deep reinforcement learning

IF 7.2 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Applied Soft Computing Pub Date : 2025-07-23 DOI:10.1016/j.asoc.2025.113578

Haifeng Li, Mo Hai

{"title":"PortfolioZero: A stock portfolio model based on deep reinforcement learning","authors":"Haifeng Li, Mo Hai","doi":"10.1016/j.asoc.2025.113578","DOIUrl":null,"url":null,"abstract":"<div><div>Current studies of portfolio mainly use reinforcement learning methods to build models aimed at achieving high investment returns while minimizing risks from market uncertainties. Two main issues will be considered: First, the complexity of financial markets makes it challenging to capture asset price change patterns. Second, current research assumes stock prices accurately show all asset information, and historical prices alone can predict future trends. However, numerous external factors can influence future judgments. We introduce PortfolioZero, a novel model to address these problems. PortfolioZero utilizes three connected deep neural networks combined with a Monte Carlo Tree to discover patterns of financial assets. In the representation network, a Transformer-based model is used to embed financial price data to capture temporal dynamics and potential correlations, providing richer feature representations; the prediction network and Monte Carlo Tree Search are redesigned to handle the continuous action space. Furthermore, we use the StructBERT model to process financial text data, extracting market information into sentiment scores, which are used to reconstruct two reward functions to capture dynamic changes of the financial market. In experiments conducted on the China A-share market, we compared our model with traditional portfolio methods and cutting-edge deep reinforcement learning algorithms. PortfolioZero achieved an average annualized return rate of 21.21% across three portfolio types, outperforming SARL by 20.64% and DDPG by 41.97%, while sentiment-enhanced reward functions improved average annualized return rate by 35% compared to basic reward.</div></div>","PeriodicalId":50737,"journal":{"name":"Applied Soft Computing","volume":"183 ","pages":"Article 113578"},"PeriodicalIF":7.2000,"publicationDate":"2025-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Soft Computing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1568494625008890","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Current studies of portfolio mainly use reinforcement learning methods to build models aimed at achieving high investment returns while minimizing risks from market uncertainties. Two main issues will be considered: First, the complexity of financial markets makes it challenging to capture asset price change patterns. Second, current research assumes stock prices accurately show all asset information, and historical prices alone can predict future trends. However, numerous external factors can influence future judgments. We introduce PortfolioZero, a novel model to address these problems. PortfolioZero utilizes three connected deep neural networks combined with a Monte Carlo Tree to discover patterns of financial assets. In the representation network, a Transformer-based model is used to embed financial price data to capture temporal dynamics and potential correlations, providing richer feature representations; the prediction network and Monte Carlo Tree Search are redesigned to handle the continuous action space. Furthermore, we use the StructBERT model to process financial text data, extracting market information into sentiment scores, which are used to reconstruct two reward functions to capture dynamic changes of the financial market. In experiments conducted on the China A-share market, we compared our model with traditional portfolio methods and cutting-edge deep reinforcement learning algorithms. PortfolioZero achieved an average annualized return rate of 21.21% across three portfolio types, outperforming SARL by 20.64% and DDPG by 41.97%, while sentiment-enhanced reward functions improved average annualized return rate by 35% compared to basic reward.

查看原文本刊更多论文

PortfolioZero：基于深度强化学习的股票投资组合模型

目前对投资组合的研究主要是利用强化学习方法来建立模型，以实现高投资回报，同时最小化市场不确定性带来的风险。将考虑两个主要问题：首先，金融市场的复杂性使得捕捉资产价格变化模式具有挑战性。第二，目前的研究假设股票价格准确地反映了所有的资产信息，仅凭历史价格就可以预测未来的趋势。然而，许多外部因素会影响未来的判断。我们介绍了PortfolioZero，一个解决这些问题的新模型。PortfolioZero利用三个连接的深度神经网络结合蒙特卡罗树来发现金融资产的模式。在表征网络中，利用基于transformer的模型嵌入金融价格数据，以捕获时间动态和潜在相关性，提供更丰富的特征表征；对预测网络和蒙特卡罗树搜索进行了重新设计，以处理连续的动作空间。此外，我们使用StructBERT模型对金融文本数据进行处理，将市场信息提取为情绪分数，并使用情绪分数重构两个奖励函数来捕捉金融市场的动态变化。在中国a股市场进行的实验中，我们将我们的模型与传统的投资组合方法和前沿的深度强化学习算法进行了比较。PortfolioZero在三种投资组合类型中实现了21.21%的平均年化回报率，比SARL高出20.64%，比DDPG高出41.97%，而情绪增强的奖励功能比基本奖励提高了35%的平均年化回报率。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Applied Soft Computing 工程技术-计算机：跨学科应用

CiteScore

15.80

自引率

6.90%

发文量

874

审稿时长

10.9 months

期刊介绍： Applied Soft Computing is an international journal promoting an integrated view of soft computing to solve real life problems.The focus is to publish the highest quality research in application and convergence of the areas of Fuzzy Logic, Neural Networks, Evolutionary Computing, Rough Sets and other similar techniques to address real world complexities. Applied Soft Computing is a rolling publication: articles are published as soon as the editor-in-chief has accepted them. Therefore, the web site will continuously be updated with new articles and the publication time will be short.