机会约束马尔可夫决策过程中的最优确定性策略学习

IF 2 Q2 AUTOMATION & CONTROL SYSTEMS

IEEE Control Systems Letters Pub Date : 2025-09-16 DOI:10.1109/LCSYS.2025.3610666

Hongyu Yi;Chenbei Lu;Chenye Wu

{"title":"机会约束马尔可夫决策过程中的最优确定性策略学习","authors":"Hongyu Yi;Chenbei Lu;Chenye Wu","doi":"10.1109/LCSYS.2025.3610666","DOIUrl":null,"url":null,"abstract":"Constrained Markov Decision Processes (CMDPs) are widely used for online decision-making under constraints. However, their applicability is often limited by the reliance on expectation-based linear constraints. Chance-Constrained MDPs (CCMDPs) address this by incorporating nonlinear, probabilistic constraints, yet are often intractable and approximated via CVaR-based reformulations. In this letter, we propose a tractable framework for CCMDPs to exactly solve the best deterministic policies based on a three-stage, model-based constraint learning algorithm. Theoretically, we establish a polynomial sample complexity guarantee for feasible policy optimization using a novel distributional concentration analysis. A case study on a thermostatically controlled load demonstrates the effectiveness of our approach.","PeriodicalId":37235,"journal":{"name":"IEEE Control Systems Letters","volume":"9 ","pages":"2217-2222"},"PeriodicalIF":2.0000,"publicationDate":"2025-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"On the Optimal Deterministic Policy Learning in Chance-Constrained Markov Decision Processes\",\"authors\":\"Hongyu Yi;Chenbei Lu;Chenye Wu\",\"doi\":\"10.1109/LCSYS.2025.3610666\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Constrained Markov Decision Processes (CMDPs) are widely used for online decision-making under constraints. However, their applicability is often limited by the reliance on expectation-based linear constraints. Chance-Constrained MDPs (CCMDPs) address this by incorporating nonlinear, probabilistic constraints, yet are often intractable and approximated via CVaR-based reformulations. In this letter, we propose a tractable framework for CCMDPs to exactly solve the best deterministic policies based on a three-stage, model-based constraint learning algorithm. Theoretically, we establish a polynomial sample complexity guarantee for feasible policy optimization using a novel distributional concentration analysis. A case study on a thermostatically controlled load demonstrates the effectiveness of our approach.\",\"PeriodicalId\":37235,\"journal\":{\"name\":\"IEEE Control Systems Letters\",\"volume\":\"9 \",\"pages\":\"2217-2222\"},\"PeriodicalIF\":2.0000,\"publicationDate\":\"2025-09-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Control Systems Letters\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/11165110/\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"AUTOMATION & CONTROL SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Control Systems Letters","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/11165110/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}

引用次数: 0

摘要

约束马尔可夫决策过程（Constrained Markov Decision Processes, CMDPs）广泛用于约束下的在线决策。然而，它们的适用性常常受到依赖于基于期望的线性约束的限制。机会约束mdp （ccmdp）通过结合非线性、概率约束来解决这个问题，但通常很难通过基于cvar的重新公式来近似。在这封信中，我们提出了一个易于处理的框架，用于ccmdp基于三阶段，基于模型的约束学习算法来精确解决最佳确定性策略。从理论上讲，我们利用一种新的分布集中分析，建立了可行策略优化的多项式样本复杂度保证。一个恒温控制负载的案例研究证明了我们方法的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

On the Optimal Deterministic Policy Learning in Chance-Constrained Markov Decision Processes

Constrained Markov Decision Processes (CMDPs) are widely used for online decision-making under constraints. However, their applicability is often limited by the reliance on expectation-based linear constraints. Chance-Constrained MDPs (CCMDPs) address this by incorporating nonlinear, probabilistic constraints, yet are often intractable and approximated via CVaR-based reformulations. In this letter, we propose a tractable framework for CCMDPs to exactly solve the best deterministic policies based on a three-stage, model-based constraint learning algorithm. Theoretically, we establish a polynomial sample complexity guarantee for feasible policy optimization using a novel distributional concentration analysis. A case study on a thermostatically controlled load demonstrates the effectiveness of our approach.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Control Systems Letters Mathematics-Control and Optimization

CiteScore

4.40

自引率

13.30%

发文量

471