Tianjiao Li , Ziwei Guan , Shaofeng Zou , Tengyu Xu , Yingbin Liang , Guanghui Lan
{"title":"约束马尔可夫决策过程的更快算法和更清晰分析","authors":"Tianjiao Li , Ziwei Guan , Shaofeng Zou , Tengyu Xu , Yingbin Liang , Guanghui Lan","doi":"10.1016/j.orl.2024.107107","DOIUrl":null,"url":null,"abstract":"<div><p>The problem of constrained Markov decision process <span>(CMDP)</span> is investigated, where an agent aims to maximize the expected accumulated reward subject to constraints on its utilities/costs. We propose a new primal-dual approach with a novel integration of entropy regularization and Nesterov's accelerated gradient method. The proposed approach is shown to converge to the global optimum with a complexity of <span><math><mover><mrow><mi>O</mi></mrow><mrow><mo>˜</mo></mrow></mover><mo>(</mo><mn>1</mn><mo>/</mo><mi>ϵ</mi><mo>)</mo></math></span> in terms of the optimality gap and the constraint violation, which improves the complexity of the existing primal-dual approaches by a factor of <span><math><mi>O</mi><mo>(</mo><mn>1</mn><mo>/</mo><mi>ϵ</mi><mo>)</mo></math></span>.</p></div>","PeriodicalId":54682,"journal":{"name":"Operations Research Letters","volume":"54 ","pages":"Article 107107"},"PeriodicalIF":0.8000,"publicationDate":"2024-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Faster algorithm and sharper analysis for constrained Markov decision process\",\"authors\":\"Tianjiao Li , Ziwei Guan , Shaofeng Zou , Tengyu Xu , Yingbin Liang , Guanghui Lan\",\"doi\":\"10.1016/j.orl.2024.107107\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>The problem of constrained Markov decision process <span>(CMDP)</span> is investigated, where an agent aims to maximize the expected accumulated reward subject to constraints on its utilities/costs. We propose a new primal-dual approach with a novel integration of entropy regularization and Nesterov's accelerated gradient method. The proposed approach is shown to converge to the global optimum with a complexity of <span><math><mover><mrow><mi>O</mi></mrow><mrow><mo>˜</mo></mrow></mover><mo>(</mo><mn>1</mn><mo>/</mo><mi>ϵ</mi><mo>)</mo></math></span> in terms of the optimality gap and the constraint violation, which improves the complexity of the existing primal-dual approaches by a factor of <span><math><mi>O</mi><mo>(</mo><mn>1</mn><mo>/</mo><mi>ϵ</mi><mo>)</mo></math></span>.</p></div>\",\"PeriodicalId\":54682,\"journal\":{\"name\":\"Operations Research Letters\",\"volume\":\"54 \",\"pages\":\"Article 107107\"},\"PeriodicalIF\":0.8000,\"publicationDate\":\"2024-03-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Operations Research Letters\",\"FirstCategoryId\":\"91\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0167637724000439\",\"RegionNum\":4,\"RegionCategory\":\"管理学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"OPERATIONS RESEARCH & MANAGEMENT SCIENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Operations Research Letters","FirstCategoryId":"91","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167637724000439","RegionNum":4,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"OPERATIONS RESEARCH & MANAGEMENT SCIENCE","Score":null,"Total":0}
Faster algorithm and sharper analysis for constrained Markov decision process
The problem of constrained Markov decision process (CMDP) is investigated, where an agent aims to maximize the expected accumulated reward subject to constraints on its utilities/costs. We propose a new primal-dual approach with a novel integration of entropy regularization and Nesterov's accelerated gradient method. The proposed approach is shown to converge to the global optimum with a complexity of in terms of the optimality gap and the constraint violation, which improves the complexity of the existing primal-dual approaches by a factor of .
期刊介绍:
Operations Research Letters is committed to the rapid review and fast publication of short articles on all aspects of operations research and analytics. Apart from a limitation to eight journal pages, quality, originality, relevance and clarity are the only criteria for selecting the papers to be published. ORL covers the broad field of optimization, stochastic models and game theory. Specific areas of interest include networks, routing, location, queueing, scheduling, inventory, reliability, and financial engineering. We wish to explore interfaces with other fields such as life sciences and health care, artificial intelligence and machine learning, energy distribution, and computational social sciences and humanities. Our traditional strength is in methodology, including theory, modelling, algorithms and computational studies. We also welcome novel applications and concise literature reviews.