Continuous-time optimal investment with portfolio constraints: A reinforcement learning approach

IF 6 2区管理学 Q1 OPERATIONS RESEARCH & MANAGEMENT SCIENCE

European Journal of Operational Research Pub Date : 2025-09-17 DOI:10.1016/j.ejor.2025.08.032

Huy Chau , Duy Nguyen , Thai Nguyen

{"title":"Continuous-time optimal investment with portfolio constraints: A reinforcement learning approach","authors":"Huy Chau , Duy Nguyen , Thai Nguyen","doi":"10.1016/j.ejor.2025.08.032","DOIUrl":null,"url":null,"abstract":"<div><div>In a reinforcement learning (RL) framework, we study the exploratory version of the continuous time expected utility (EU) maximization problem with a portfolio constraint that includes widely-used financial regulations such as short-selling constraints and borrowing prohibition. The optimal feedback policy of the exploratory unconstrained classical EU problem is shown to be Gaussian. In the case where the portfolio weight is constrained to a given interval, the corresponding constrained optimal exploratory policy follows a truncated Gaussian distribution. We verify that the closed form optimal solution obtained for logarithmic utility and quadratic utility for both unconstrained and constrained situations converge to the non-exploratory expected utility counterpart when the exploration weight goes to zero. Finally, we establish a policy improvement theorem and devise an implementable reinforcement learning algorithm by casting the optimal problem in a martingale framework. Our numerical examples show that exploration leads to an optimal wealth process that is more dispersedly distributed with heavier tail compared to that of the case without exploration. This effect becomes less significant as the exploration parameter is smaller. Moreover, the numerical implementation also confirms the intuitive understanding that a broader domain of investment opportunities necessitates a higher exploration cost. Notably, when subjected to both short-selling and money borrowing constraints, the exploration cost becomes negligible compared to the unconstrained case.</div></div>","PeriodicalId":55161,"journal":{"name":"European Journal of Operational Research","volume":"328 3","pages":"Pages 1068-1092"},"PeriodicalIF":6.0000,"publicationDate":"2025-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"European Journal of Operational Research","FirstCategoryId":"91","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S037722172500671X","RegionNum":2,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"OPERATIONS RESEARCH & MANAGEMENT SCIENCE","Score":null,"Total":0}

引用次数: 0

Abstract

In a reinforcement learning (RL) framework, we study the exploratory version of the continuous time expected utility (EU) maximization problem with a portfolio constraint that includes widely-used financial regulations such as short-selling constraints and borrowing prohibition. The optimal feedback policy of the exploratory unconstrained classical EU problem is shown to be Gaussian. In the case where the portfolio weight is constrained to a given interval, the corresponding constrained optimal exploratory policy follows a truncated Gaussian distribution. We verify that the closed form optimal solution obtained for logarithmic utility and quadratic utility for both unconstrained and constrained situations converge to the non-exploratory expected utility counterpart when the exploration weight goes to zero. Finally, we establish a policy improvement theorem and devise an implementable reinforcement learning algorithm by casting the optimal problem in a martingale framework. Our numerical examples show that exploration leads to an optimal wealth process that is more dispersedly distributed with heavier tail compared to that of the case without exploration. This effect becomes less significant as the exploration parameter is smaller. Moreover, the numerical implementation also confirms the intuitive understanding that a broader domain of investment opportunities necessitates a higher exploration cost. Notably, when subjected to both short-selling and money borrowing constraints, the exploration cost becomes negligible compared to the unconstrained case.

查看原文本刊更多论文

具有投资组合约束的连续时间最优投资：一种强化学习方法

在强化学习（RL）框架中，我们研究了具有投资组合约束的连续时间期望效用（EU）最大化问题的探索性版本，该约束包括卖空约束和借款禁令等广泛使用的金融法规。研究了探索性无约束经典EU问题的最优反馈策略是高斯型的。在组合权重被约束于给定区间的情况下，相应的约束最优探索策略服从截断高斯分布。我们验证了在无约束和有约束情况下对数效用和二次效用的封闭形式最优解收敛于非探索性期望效用对应物，当勘探权为零时。最后，我们建立了一个策略改进定理，并通过将最优问题投射到鞅框架中，设计了一个可实现的强化学习算法。我们的数值算例表明，与不进行勘探的情况相比，勘探导致的最优财富过程分布更分散，尾部更重。这种影响随着勘探参数的减小而减小。此外，数值计算也证实了一个直观的认识，即更广泛的投资机会领域需要更高的勘探成本。值得注意的是，当同时受到卖空和借贷约束时，与不受约束的情况相比，勘探成本变得可以忽略不计。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

European Journal of Operational Research 管理科学-运筹学与管理科学

CiteScore

11.90

自引率

9.40%

发文量

786

审稿时长

8.2 months

期刊介绍： The European Journal of Operational Research (EJOR) publishes high quality, original papers that contribute to the methodology of operational research (OR) and to the practice of decision making.