Continuous-time optimal investment with portfolio constraints: A reinforcement learning approach

IF 6 2区 管理学 Q1 OPERATIONS RESEARCH & MANAGEMENT SCIENCE
Huy Chau , Duy Nguyen , Thai Nguyen
{"title":"Continuous-time optimal investment with portfolio constraints: A reinforcement learning approach","authors":"Huy Chau ,&nbsp;Duy Nguyen ,&nbsp;Thai Nguyen","doi":"10.1016/j.ejor.2025.08.032","DOIUrl":null,"url":null,"abstract":"<div><div>In a reinforcement learning (RL) framework, we study the exploratory version of the continuous time expected utility (EU) maximization problem with a portfolio constraint that includes widely-used financial regulations such as short-selling constraints and borrowing prohibition. The optimal feedback policy of the exploratory unconstrained classical EU problem is shown to be Gaussian. In the case where the portfolio weight is constrained to a given interval, the corresponding constrained optimal exploratory policy follows a truncated Gaussian distribution. We verify that the closed form optimal solution obtained for logarithmic utility and quadratic utility for both unconstrained and constrained situations converge to the non-exploratory expected utility counterpart when the exploration weight goes to zero. Finally, we establish a policy improvement theorem and devise an implementable reinforcement learning algorithm by casting the optimal problem in a martingale framework. Our numerical examples show that exploration leads to an optimal wealth process that is more dispersedly distributed with heavier tail compared to that of the case without exploration. This effect becomes less significant as the exploration parameter is smaller. Moreover, the numerical implementation also confirms the intuitive understanding that a broader domain of investment opportunities necessitates a higher exploration cost. Notably, when subjected to both short-selling and money borrowing constraints, the exploration cost becomes negligible compared to the unconstrained case.</div></div>","PeriodicalId":55161,"journal":{"name":"European Journal of Operational Research","volume":"328 3","pages":"Pages 1068-1092"},"PeriodicalIF":6.0000,"publicationDate":"2025-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"European Journal of Operational Research","FirstCategoryId":"91","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S037722172500671X","RegionNum":2,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"OPERATIONS RESEARCH & MANAGEMENT SCIENCE","Score":null,"Total":0}
引用次数: 0

Abstract

In a reinforcement learning (RL) framework, we study the exploratory version of the continuous time expected utility (EU) maximization problem with a portfolio constraint that includes widely-used financial regulations such as short-selling constraints and borrowing prohibition. The optimal feedback policy of the exploratory unconstrained classical EU problem is shown to be Gaussian. In the case where the portfolio weight is constrained to a given interval, the corresponding constrained optimal exploratory policy follows a truncated Gaussian distribution. We verify that the closed form optimal solution obtained for logarithmic utility and quadratic utility for both unconstrained and constrained situations converge to the non-exploratory expected utility counterpart when the exploration weight goes to zero. Finally, we establish a policy improvement theorem and devise an implementable reinforcement learning algorithm by casting the optimal problem in a martingale framework. Our numerical examples show that exploration leads to an optimal wealth process that is more dispersedly distributed with heavier tail compared to that of the case without exploration. This effect becomes less significant as the exploration parameter is smaller. Moreover, the numerical implementation also confirms the intuitive understanding that a broader domain of investment opportunities necessitates a higher exploration cost. Notably, when subjected to both short-selling and money borrowing constraints, the exploration cost becomes negligible compared to the unconstrained case.
具有投资组合约束的连续时间最优投资:一种强化学习方法
在强化学习(RL)框架中,我们研究了具有投资组合约束的连续时间期望效用(EU)最大化问题的探索性版本,该约束包括卖空约束和借款禁令等广泛使用的金融法规。研究了探索性无约束经典EU问题的最优反馈策略是高斯型的。在组合权重被约束于给定区间的情况下,相应的约束最优探索策略服从截断高斯分布。我们验证了在无约束和有约束情况下对数效用和二次效用的封闭形式最优解收敛于非探索性期望效用对应物,当勘探权为零时。最后,我们建立了一个策略改进定理,并通过将最优问题投射到鞅框架中,设计了一个可实现的强化学习算法。我们的数值算例表明,与不进行勘探的情况相比,勘探导致的最优财富过程分布更分散,尾部更重。这种影响随着勘探参数的减小而减小。此外,数值计算也证实了一个直观的认识,即更广泛的投资机会领域需要更高的勘探成本。值得注意的是,当同时受到卖空和借贷约束时,与不受约束的情况相比,勘探成本变得可以忽略不计。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
European Journal of Operational Research
European Journal of Operational Research 管理科学-运筹学与管理科学
CiteScore
11.90
自引率
9.40%
发文量
786
审稿时长
8.2 months
期刊介绍: The European Journal of Operational Research (EJOR) publishes high quality, original papers that contribute to the methodology of operational research (OR) and to the practice of decision making.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信