{"title":"On the Optimal Deterministic Policy Learning in Chance-Constrained Markov Decision Processes","authors":"Hongyu Yi;Chenbei Lu;Chenye Wu","doi":"10.1109/LCSYS.2025.3610666","DOIUrl":null,"url":null,"abstract":"Constrained Markov Decision Processes (CMDPs) are widely used for online decision-making under constraints. However, their applicability is often limited by the reliance on expectation-based linear constraints. Chance-Constrained MDPs (CCMDPs) address this by incorporating nonlinear, probabilistic constraints, yet are often intractable and approximated via CVaR-based reformulations. In this letter, we propose a tractable framework for CCMDPs to exactly solve the best deterministic policies based on a three-stage, model-based constraint learning algorithm. Theoretically, we establish a polynomial sample complexity guarantee for feasible policy optimization using a novel distributional concentration analysis. A case study on a thermostatically controlled load demonstrates the effectiveness of our approach.","PeriodicalId":37235,"journal":{"name":"IEEE Control Systems Letters","volume":"9 ","pages":"2217-2222"},"PeriodicalIF":2.0000,"publicationDate":"2025-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Control Systems Letters","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/11165110/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Constrained Markov Decision Processes (CMDPs) are widely used for online decision-making under constraints. However, their applicability is often limited by the reliance on expectation-based linear constraints. Chance-Constrained MDPs (CCMDPs) address this by incorporating nonlinear, probabilistic constraints, yet are often intractable and approximated via CVaR-based reformulations. In this letter, we propose a tractable framework for CCMDPs to exactly solve the best deterministic policies based on a three-stage, model-based constraint learning algorithm. Theoretically, we establish a polynomial sample complexity guarantee for feasible policy optimization using a novel distributional concentration analysis. A case study on a thermostatically controlled load demonstrates the effectiveness of our approach.