制度切换市场中连续时间均值方差投资组合选择的强化学习

IF 1.9 3区 经济学 Q2 ECONOMICS
Bo Wu, Lingfei Li
{"title":"制度切换市场中连续时间均值方差投资组合选择的强化学习","authors":"Bo Wu,&nbsp;Lingfei Li","doi":"10.1016/j.jedc.2023.104787","DOIUrl":null,"url":null,"abstract":"<div><p><span>We propose a reinforcement learning (RL) approach to solve the continuous-time mean-variance portfolio selection problem in a regime-switching market, where the market regime is unobservable. To encourage exploration for learning, we formulate an exploratory stochastic control problem with an entropy-regularized mean-variance objective. We obtain semi-analytical representations of the optimal value function and optimal policy, which involve unknown solutions to two linear parabolic </span>partial differential equations<span> (PDEs). We utilize these representations to parametrize the value function and policy for learning with the unknown solutions to the PDEs approximated based on polynomials. We develop an actor-critic RL algorithm to learn the optimal policy through interactions with the market environment. The algorithm carries out filtering to obtain the belief probability of the market regime and performs policy evaluation and policy gradient updates alternately. Empirical results demonstrate the advantages of our RL algorithm in relatively long-term investment problems over the classical control approach and an RL algorithm developed for the continuous-time mean-variance problem without considering regime switches.</span></p></div>","PeriodicalId":48314,"journal":{"name":"Journal of Economic Dynamics & Control","volume":null,"pages":null},"PeriodicalIF":1.9000,"publicationDate":"2023-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Reinforcement learning for continuous-time mean-variance portfolio selection in a regime-switching market\",\"authors\":\"Bo Wu,&nbsp;Lingfei Li\",\"doi\":\"10.1016/j.jedc.2023.104787\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p><span>We propose a reinforcement learning (RL) approach to solve the continuous-time mean-variance portfolio selection problem in a regime-switching market, where the market regime is unobservable. To encourage exploration for learning, we formulate an exploratory stochastic control problem with an entropy-regularized mean-variance objective. We obtain semi-analytical representations of the optimal value function and optimal policy, which involve unknown solutions to two linear parabolic </span>partial differential equations<span> (PDEs). We utilize these representations to parametrize the value function and policy for learning with the unknown solutions to the PDEs approximated based on polynomials. We develop an actor-critic RL algorithm to learn the optimal policy through interactions with the market environment. The algorithm carries out filtering to obtain the belief probability of the market regime and performs policy evaluation and policy gradient updates alternately. Empirical results demonstrate the advantages of our RL algorithm in relatively long-term investment problems over the classical control approach and an RL algorithm developed for the continuous-time mean-variance problem without considering regime switches.</span></p></div>\",\"PeriodicalId\":48314,\"journal\":{\"name\":\"Journal of Economic Dynamics & Control\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":1.9000,\"publicationDate\":\"2023-11-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Economic Dynamics & Control\",\"FirstCategoryId\":\"96\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0165188923001938\",\"RegionNum\":3,\"RegionCategory\":\"经济学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ECONOMICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Economic Dynamics & Control","FirstCategoryId":"96","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0165188923001938","RegionNum":3,"RegionCategory":"经济学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ECONOMICS","Score":null,"Total":0}
引用次数: 0

摘要

我们提出了一种强化学习(RL)方法来解决制度切换市场中连续时间均值方差投资组合选择问题,其中市场制度是不可观察的。为了鼓励对学习的探索,我们提出了一个具有熵正则化均值方差目标的探索性随机控制问题。本文给出了两个线性抛物型偏微分方程的未知解的最优值函数和最优策略的半解析表示。我们利用这些表征来参数化基于多项式逼近的偏微分方程的未知解的值函数和学习策略。我们开发了一个actor-critic RL算法,通过与市场环境的相互作用来学习最优策略。该算法通过过滤得到市场制度的相信概率,并交替进行政策评估和政策梯度更新。实证结果表明,我们的RL算法在相对长期投资问题上优于经典控制方法和针对连续时间均值-方差问题开发的不考虑状态切换的RL算法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Reinforcement learning for continuous-time mean-variance portfolio selection in a regime-switching market

We propose a reinforcement learning (RL) approach to solve the continuous-time mean-variance portfolio selection problem in a regime-switching market, where the market regime is unobservable. To encourage exploration for learning, we formulate an exploratory stochastic control problem with an entropy-regularized mean-variance objective. We obtain semi-analytical representations of the optimal value function and optimal policy, which involve unknown solutions to two linear parabolic partial differential equations (PDEs). We utilize these representations to parametrize the value function and policy for learning with the unknown solutions to the PDEs approximated based on polynomials. We develop an actor-critic RL algorithm to learn the optimal policy through interactions with the market environment. The algorithm carries out filtering to obtain the belief probability of the market regime and performs policy evaluation and policy gradient updates alternately. Empirical results demonstrate the advantages of our RL algorithm in relatively long-term investment problems over the classical control approach and an RL algorithm developed for the continuous-time mean-variance problem without considering regime switches.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
3.10
自引率
10.50%
发文量
199
期刊介绍: The journal provides an outlet for publication of research concerning all theoretical and empirical aspects of economic dynamics and control as well as the development and use of computational methods in economics and finance. Contributions regarding computational methods may include, but are not restricted to, artificial intelligence, databases, decision support systems, genetic algorithms, modelling languages, neural networks, numerical algorithms for optimization, control and equilibria, parallel computing and qualitative reasoning.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信