{"title":"A primal–dual policy iteration algorithm for constrained Markov decision processes","authors":"Zeyu Liu , Xueping Li , Anahita Khojandi","doi":"10.1016/j.ejor.2025.08.038","DOIUrl":null,"url":null,"abstract":"<div><div>The solution algorithms of Constrained Markov Decision Process (CMDP), a widely adopted model for sequential decision-making, have been intensively studied in the literature. Despite increasing effort, the Linear Programming (LP) formulation of CMDP remains the dominant exact method that leads to the optimal solution without constraint violations. However, the LP formulation is computationally inefficient due to the curse of dimensionality in CMDP state and action spaces. In this study, we introduce a novel policy iteration method for CMDP, based on decomposition and row-generation techniques. We design a Primal–Dual Policy Iteration (PDPI) algorithm that utilizes state values and Lagrangian multipliers to improve randomized stationary policies in an iterative fashion. We analytically show that upon convergence, PDPI produces the optimal solution for CMDP. An upper bound of the convergence iterations is also given. To validate the algorithm performance, we conduct comprehensive computational experiments on six benchmarking problems curated from the literature. Results show that PDPI outperforms conventional methods considerably, improving the total algorithm runtime by up to 89.19%. The improvement becomes more significant as the problem size grows larger. We further provide insights and discuss the impact of the developed method.</div></div>","PeriodicalId":55161,"journal":{"name":"European Journal of Operational Research","volume":"328 1","pages":"Pages 174-188"},"PeriodicalIF":6.0000,"publicationDate":"2025-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"European Journal of Operational Research","FirstCategoryId":"91","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0377221725006757","RegionNum":2,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"OPERATIONS RESEARCH & MANAGEMENT SCIENCE","Score":null,"Total":0}
引用次数: 0
Abstract
The solution algorithms of Constrained Markov Decision Process (CMDP), a widely adopted model for sequential decision-making, have been intensively studied in the literature. Despite increasing effort, the Linear Programming (LP) formulation of CMDP remains the dominant exact method that leads to the optimal solution without constraint violations. However, the LP formulation is computationally inefficient due to the curse of dimensionality in CMDP state and action spaces. In this study, we introduce a novel policy iteration method for CMDP, based on decomposition and row-generation techniques. We design a Primal–Dual Policy Iteration (PDPI) algorithm that utilizes state values and Lagrangian multipliers to improve randomized stationary policies in an iterative fashion. We analytically show that upon convergence, PDPI produces the optimal solution for CMDP. An upper bound of the convergence iterations is also given. To validate the algorithm performance, we conduct comprehensive computational experiments on six benchmarking problems curated from the literature. Results show that PDPI outperforms conventional methods considerably, improving the total algorithm runtime by up to 89.19%. The improvement becomes more significant as the problem size grows larger. We further provide insights and discuss the impact of the developed method.
期刊介绍:
The European Journal of Operational Research (EJOR) publishes high quality, original papers that contribute to the methodology of operational research (OR) and to the practice of decision making.