{"title":"Risk‐sensitive markov decision processes with long‐run CVaR criterion","authors":"Li Xia, Luyao Zhang, Peter W. Glynn","doi":"10.1111/poms.14077","DOIUrl":null,"url":null,"abstract":"Abstract CVaR (Conditional value at risk) is a risk metric widely used in finance. However, dynamically optimizing CVaR is difficult, because it is not a standard Markov decision process (MDP) and the principle of dynamic programming fails. In this paper, we study the infinite‐horizon discrete‐time MDP with a long‐run CVaR criterion, from the view of sensitivity‐based optimization. By introducing a pseudo‐CVaR metric, we reformulate the problem as a bilevel MDP model and derive a CVaR difference formula that quantifies the difference of long‐run CVaR under any two policies. The optimality of deterministic policies is derived. We obtain a so‐called Bellman local optimality equation for CVaR, which is a necessary and sufficient condition for locally optimal policies and only necessary for globally optimal policies. A CVaR derivative formula is also derived for providing more sensitivity information. Then we develop a policy iteration type algorithm to efficiently optimize CVaR, which is shown to converge to a local optimum in mixed policy space. Furthermore, based on the sensitivity analysis of our bilevel MDP formulation and critical points, we develop a globally optimal algorithm. The piecewise linearity and segment convexity of the optimal pseudo‐CVaR function are also established. Our main results and algorithms are further extended to optimize the mean and CVaR simultaneously. Finally, we conduct numerical experiments relating to portfolio management to demonstrate the main results. Our work sheds light on dynamically optimizing CVaR from a sensitivity viewpoint.","PeriodicalId":20623,"journal":{"name":"Production and Operations Management","volume":"45 1","pages":"0"},"PeriodicalIF":4.8000,"publicationDate":"2023-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Production and Operations Management","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1111/poms.14077","RegionNum":3,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, MANUFACTURING","Score":null,"Total":0}
引用次数: 1
Abstract
Abstract CVaR (Conditional value at risk) is a risk metric widely used in finance. However, dynamically optimizing CVaR is difficult, because it is not a standard Markov decision process (MDP) and the principle of dynamic programming fails. In this paper, we study the infinite‐horizon discrete‐time MDP with a long‐run CVaR criterion, from the view of sensitivity‐based optimization. By introducing a pseudo‐CVaR metric, we reformulate the problem as a bilevel MDP model and derive a CVaR difference formula that quantifies the difference of long‐run CVaR under any two policies. The optimality of deterministic policies is derived. We obtain a so‐called Bellman local optimality equation for CVaR, which is a necessary and sufficient condition for locally optimal policies and only necessary for globally optimal policies. A CVaR derivative formula is also derived for providing more sensitivity information. Then we develop a policy iteration type algorithm to efficiently optimize CVaR, which is shown to converge to a local optimum in mixed policy space. Furthermore, based on the sensitivity analysis of our bilevel MDP formulation and critical points, we develop a globally optimal algorithm. The piecewise linearity and segment convexity of the optimal pseudo‐CVaR function are also established. Our main results and algorithms are further extended to optimize the mean and CVaR simultaneously. Finally, we conduct numerical experiments relating to portfolio management to demonstrate the main results. Our work sheds light on dynamically optimizing CVaR from a sensitivity viewpoint.
CVaR (Conditional value at risk)是金融中广泛使用的风险度量。然而,由于CVaR不是标准的马尔可夫决策过程(MDP),动态规划原理失效,动态优化CVaR是一个难点。本文从基于灵敏度优化的角度出发,研究了具有长期CVaR准则的无限视界离散时间MDP。通过引入伪CVaR度量,我们将该问题重新表述为双层MDP模型,并推导出CVaR差异公式,该公式量化了任意两种政策下的长期CVaR差异。导出了确定性策略的最优性。我们得到了CVaR的一个Bellman局部最优方程,它是全局最优策略的充要条件和局部最优策略的充要条件。为了提供更多的敏感性信息,还推导了CVaR的导数公式。然后,我们开发了一种策略迭代型算法来有效地优化CVaR,并证明该算法在混合策略空间中收敛到局部最优。此外,基于我们的双层MDP公式和临界点的敏感性分析,我们开发了一个全局最优算法。建立了最优伪CVaR函数的分段线性和段凸性。进一步扩展了我们的主要结果和算法,以同时优化均值和CVaR。最后,我们进行了与投资组合管理相关的数值实验来证明主要结果。我们的工作从敏感性的角度阐明了动态优化CVaR。
期刊介绍:
The mission of Production and Operations Management is to serve as the flagship research journal in operations management in manufacturing and services. The journal publishes scientific research into the problems, interest, and concerns of managers who manage product and process design, operations, and supply chains. It covers all topics in product and process design, operations, and supply chain management and welcomes papers using any research paradigm.