论鲁棒性马尔可夫决策过程的凸公式

IF 1.4 3区数学 Q2 MATHEMATICS, APPLIED

Mathematics of Operations Research Pub Date : 2024-07-16 DOI:10.1287/moor.2022.0284

Julien Grand-Clément, Marek Petrik

{"title":"论鲁棒性马尔可夫决策过程的凸公式","authors":"Julien Grand-Clément, Marek Petrik","doi":"10.1287/moor.2022.0284","DOIUrl":null,"url":null,"abstract":"Robust Markov decision processes (MDPs) are used for applications of dynamic optimization in uncertain environments and have been studied extensively. Many of the main properties and algorithms of MDPs, such as value iteration and policy iteration, extend directly to RMDPs. Surprisingly, there is no known analog of the MDP convex optimization formulation for solving RMDPs. This work describes the first convex optimization formulation of RMDPs under the classical sa-rectangularity and s-rectangularity assumptions. By using entropic regularization and exponential change of variables, we derive a convex formulation with a number of variables and constraints polynomial in the number of states and actions, but with large coefficients in the constraints. We further simplify the formulation for RMDPs with polyhedral, ellipsoidal, or entropy-based uncertainty sets, showing that, in these cases, RMDPs can be reformulated as conic programs based on exponential cones, quadratic cones, and nonnegative orthants. Our work opens a new research direction for RMDPs and can serve as a first step toward obtaining a tractable convex formulation of RMDPs.Funding: The work in the paper was supported, in part, by NSF [Grants 2144601 and 1815275]; and Agence Nationale de la Recherche [Grant 11-LABX-0047].","PeriodicalId":49852,"journal":{"name":"Mathematics of Operations Research","volume":"34 1","pages":""},"PeriodicalIF":1.4000,"publicationDate":"2024-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"On the Convex Formulations of Robust Markov Decision Processes\",\"authors\":\"Julien Grand-Clément, Marek Petrik\",\"doi\":\"10.1287/moor.2022.0284\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Robust Markov decision processes (MDPs) are used for applications of dynamic optimization in uncertain environments and have been studied extensively. Many of the main properties and algorithms of MDPs, such as value iteration and policy iteration, extend directly to RMDPs. Surprisingly, there is no known analog of the MDP convex optimization formulation for solving RMDPs. This work describes the first convex optimization formulation of RMDPs under the classical sa-rectangularity and s-rectangularity assumptions. By using entropic regularization and exponential change of variables, we derive a convex formulation with a number of variables and constraints polynomial in the number of states and actions, but with large coefficients in the constraints. We further simplify the formulation for RMDPs with polyhedral, ellipsoidal, or entropy-based uncertainty sets, showing that, in these cases, RMDPs can be reformulated as conic programs based on exponential cones, quadratic cones, and nonnegative orthants. Our work opens a new research direction for RMDPs and can serve as a first step toward obtaining a tractable convex formulation of RMDPs.Funding: The work in the paper was supported, in part, by NSF [Grants 2144601 and 1815275]; and Agence Nationale de la Recherche [Grant 11-LABX-0047].\",\"PeriodicalId\":49852,\"journal\":{\"name\":\"Mathematics of Operations Research\",\"volume\":\"34 1\",\"pages\":\"\"},\"PeriodicalIF\":1.4000,\"publicationDate\":\"2024-07-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Mathematics of Operations Research\",\"FirstCategoryId\":\"100\",\"ListUrlMain\":\"https://doi.org/10.1287/moor.2022.0284\",\"RegionNum\":3,\"RegionCategory\":\"数学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"MATHEMATICS, APPLIED\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Mathematics of Operations Research","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1287/moor.2022.0284","RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MATHEMATICS, APPLIED","Score":null,"Total":0}

引用次数: 0

摘要

鲁棒马尔可夫决策过程（MDP）用于不确定环境中的动态优化应用，并已得到广泛研究。MDP 的许多主要特性和算法，如值迭代和策略迭代，都直接扩展到了 RMDP。令人惊讶的是，目前还没有已知的用于求解 RMDP 的 MDP 凸优化公式。本研究首次描述了在经典 sa-rectangularity 和 s-rectangularity 假设下的 RMDPs 凸优化公式。通过使用熵正则化和变量指数变化，我们推导出了一种变量和约束条件数量与状态和行动数量成多项式关系，但约束条件系数较大的凸优化公式。我们进一步简化了具有多面体、椭圆形或基于熵的不确定性集的 RMDPs 的表述，表明在这些情况下，RMDPs 可以重新表述为基于指数锥、二次锥和非负正交的圆锥程序。我们的工作为 RMDPs 开辟了一个新的研究方向，并为获得 RMDPs 的可控凸表述迈出了第一步：本文的部分研究工作得到了国家自然科学基金[Grants 2144601 and 1815275]和Agence Nationale de la Recherche [Grant 11-LABX-0047]的资助。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

On the Convex Formulations of Robust Markov Decision Processes

Robust Markov decision processes (MDPs) are used for applications of dynamic optimization in uncertain environments and have been studied extensively. Many of the main properties and algorithms of MDPs, such as value iteration and policy iteration, extend directly to RMDPs. Surprisingly, there is no known analog of the MDP convex optimization formulation for solving RMDPs. This work describes the first convex optimization formulation of RMDPs under the classical sa-rectangularity and s-rectangularity assumptions. By using entropic regularization and exponential change of variables, we derive a convex formulation with a number of variables and constraints polynomial in the number of states and actions, but with large coefficients in the constraints. We further simplify the formulation for RMDPs with polyhedral, ellipsoidal, or entropy-based uncertainty sets, showing that, in these cases, RMDPs can be reformulated as conic programs based on exponential cones, quadratic cones, and nonnegative orthants. Our work opens a new research direction for RMDPs and can serve as a first step toward obtaining a tractable convex formulation of RMDPs.Funding: The work in the paper was supported, in part, by NSF [Grants 2144601 and 1815275]; and Agence Nationale de la Recherche [Grant 11-LABX-0047].

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Mathematics of Operations Research 管理科学-应用数学

CiteScore

3.40

自引率

5.90%

发文量

178

审稿时长

15.0 months

期刊介绍： Mathematics of Operations Research is an international journal of the Institute for Operations Research and the Management Sciences (INFORMS). The journal invites articles concerned with the mathematical and computational foundations in the areas of continuous, discrete, and stochastic optimization; mathematical programming; dynamic programming; stochastic processes; stochastic models; simulation methodology; control and adaptation; networks; game theory; and decision theory. Also sought are contributions to learning theory and machine learning that have special relevance to decision making, operations research, and management science. The emphasis is on originality, quality, and importance; correctness alone is not sufficient. Significant developments in operations research and management science not having substantial mathematical interest should be directed to other journals such as Management Science or Operations Research.