制度切换市场中连续时间均值方差投资组合选择的强化学习

IF 1.9 3区经济学 Q2 ECONOMICS

Journal of Economic Dynamics & Control Pub Date : 2023-11-10 DOI:10.1016/j.jedc.2023.104787

Bo Wu, Lingfei Li

{"title":"制度切换市场中连续时间均值方差投资组合选择的强化学习","authors":"Bo Wu, Lingfei Li","doi":"10.1016/j.jedc.2023.104787","DOIUrl":null,"url":null,"abstract":"<div><p><span>We propose a reinforcement learning (RL) approach to solve the continuous-time mean-variance portfolio selection problem in a regime-switching market, where the market regime is unobservable. To encourage exploration for learning, we formulate an exploratory stochastic control problem with an entropy-regularized mean-variance objective. We obtain semi-analytical representations of the optimal value function and optimal policy, which involve unknown solutions to two linear parabolic </span>partial differential equations<span> (PDEs). We utilize these representations to parametrize the value function and policy for learning with the unknown solutions to the PDEs approximated based on polynomials. We develop an actor-critic RL algorithm to learn the optimal policy through interactions with the market environment. The algorithm carries out filtering to obtain the belief probability of the market regime and performs policy evaluation and policy gradient updates alternately. Empirical results demonstrate the advantages of our RL algorithm in relatively long-term investment problems over the classical control approach and an RL algorithm developed for the continuous-time mean-variance problem without considering regime switches.</span></p></div>","PeriodicalId":48314,"journal":{"name":"Journal of Economic Dynamics & Control","volume":null,"pages":null},"PeriodicalIF":1.9000,"publicationDate":"2023-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Reinforcement learning for continuous-time mean-variance portfolio selection in a regime-switching market\",\"authors\":\"Bo Wu, Lingfei Li\",\"doi\":\"10.1016/j.jedc.2023.104787\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p><span>We propose a reinforcement learning (RL) approach to solve the continuous-time mean-variance portfolio selection problem in a regime-switching market, where the market regime is unobservable. To encourage exploration for learning, we formulate an exploratory stochastic control problem with an entropy-regularized mean-variance objective. We obtain semi-analytical representations of the optimal value function and optimal policy, which involve unknown solutions to two linear parabolic </span>partial differential equations<span> (PDEs). We utilize these representations to parametrize the value function and policy for learning with the unknown solutions to the PDEs approximated based on polynomials. We develop an actor-critic RL algorithm to learn the optimal policy through interactions with the market environment. The algorithm carries out filtering to obtain the belief probability of the market regime and performs policy evaluation and policy gradient updates alternately. Empirical results demonstrate the advantages of our RL algorithm in relatively long-term investment problems over the classical control approach and an RL algorithm developed for the continuous-time mean-variance problem without considering regime switches.</span></p></div>\",\"PeriodicalId\":48314,\"journal\":{\"name\":\"Journal of Economic Dynamics & Control\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":1.9000,\"publicationDate\":\"2023-11-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Economic Dynamics & Control\",\"FirstCategoryId\":\"96\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0165188923001938\",\"RegionNum\":3,\"RegionCategory\":\"经济学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ECONOMICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Economic Dynamics & Control","FirstCategoryId":"96","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0165188923001938","RegionNum":3,"RegionCategory":"经济学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ECONOMICS","Score":null,"Total":0}

引用次数: 0

摘要

我们提出了一种强化学习(RL)方法来解决制度切换市场中连续时间均值方差投资组合选择问题，其中市场制度是不可观察的。为了鼓励对学习的探索，我们提出了一个具有熵正则化均值方差目标的探索性随机控制问题。本文给出了两个线性抛物型偏微分方程的未知解的最优值函数和最优策略的半解析表示。我们利用这些表征来参数化基于多项式逼近的偏微分方程的未知解的值函数和学习策略。我们开发了一个actor-critic RL算法，通过与市场环境的相互作用来学习最优策略。该算法通过过滤得到市场制度的相信概率，并交替进行政策评估和政策梯度更新。实证结果表明，我们的RL算法在相对长期投资问题上优于经典控制方法和针对连续时间均值-方差问题开发的不考虑状态切换的RL算法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Reinforcement learning for continuous-time mean-variance portfolio selection in a regime-switching market

We propose a reinforcement learning (RL) approach to solve the continuous-time mean-variance portfolio selection problem in a regime-switching market, where the market regime is unobservable. To encourage exploration for learning, we formulate an exploratory stochastic control problem with an entropy-regularized mean-variance objective. We obtain semi-analytical representations of the optimal value function and optimal policy, which involve unknown solutions to two linear parabolic partial differential equations (PDEs). We utilize these representations to parametrize the value function and policy for learning with the unknown solutions to the PDEs approximated based on polynomials. We develop an actor-critic RL algorithm to learn the optimal policy through interactions with the market environment. The algorithm carries out filtering to obtain the belief probability of the market regime and performs policy evaluation and policy gradient updates alternately. Empirical results demonstrate the advantages of our RL algorithm in relatively long-term investment problems over the classical control approach and an RL algorithm developed for the continuous-time mean-variance problem without considering regime switches.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of Economic Dynamics & Control ECONOMICS-

CiteScore

3.10

自引率

10.50%

发文量

199

期刊介绍： The journal provides an outlet for publication of research concerning all theoretical and empirical aspects of economic dynamics and control as well as the development and use of computational methods in economics and finance. Contributions regarding computational methods may include, but are not restricted to, artificial intelligence, databases, decision support systems, genetic algorithms, modelling languages, neural networks, numerical algorithms for optimization, control and equilibria, parallel computing and qualitative reasoning.