{"title":"论最大化马尔可夫决策过程超额完成目标的概率","authors":"Tanhao Huang, Yanan Dai, Jinwen Chen","doi":"10.1007/s11081-023-09870-4","DOIUrl":null,"url":null,"abstract":"<p>This paper studies the dual relation between risk-sensitive control and large deviation control of maximizing the probability for out-performing a target for Markov Decision Processes. To derive the desired duality, we apply a non-linear extension of the Krein-Rutman Theorem to characterize the optimal risk-sensitive value and prove that an optimal policy exists which is stationary and deterministic. The right-hand side derivative of this value function is used to characterize the specific targets which make the duality to hold. It is proved that the optimal policy for the “out-performing” probability can be approximated by the optimal one for the risk-sensitive control. The range of the (right-hand, left-hand side) derivative of the optimal risk-sensitive value function plays an important role. Some essential differences between these two types of optimal control problems are presented.</p>","PeriodicalId":56141,"journal":{"name":"Optimization and Engineering","volume":"195 1","pages":""},"PeriodicalIF":2.0000,"publicationDate":"2023-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"On maximizing probabilities for over-performing a target for Markov decision processes\",\"authors\":\"Tanhao Huang, Yanan Dai, Jinwen Chen\",\"doi\":\"10.1007/s11081-023-09870-4\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>This paper studies the dual relation between risk-sensitive control and large deviation control of maximizing the probability for out-performing a target for Markov Decision Processes. To derive the desired duality, we apply a non-linear extension of the Krein-Rutman Theorem to characterize the optimal risk-sensitive value and prove that an optimal policy exists which is stationary and deterministic. The right-hand side derivative of this value function is used to characterize the specific targets which make the duality to hold. It is proved that the optimal policy for the “out-performing” probability can be approximated by the optimal one for the risk-sensitive control. The range of the (right-hand, left-hand side) derivative of the optimal risk-sensitive value function plays an important role. Some essential differences between these two types of optimal control problems are presented.</p>\",\"PeriodicalId\":56141,\"journal\":{\"name\":\"Optimization and Engineering\",\"volume\":\"195 1\",\"pages\":\"\"},\"PeriodicalIF\":2.0000,\"publicationDate\":\"2023-12-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Optimization and Engineering\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://doi.org/10.1007/s11081-023-09870-4\",\"RegionNum\":3,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ENGINEERING, MULTIDISCIPLINARY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Optimization and Engineering","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.1007/s11081-023-09870-4","RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, MULTIDISCIPLINARY","Score":null,"Total":0}
On maximizing probabilities for over-performing a target for Markov decision processes
This paper studies the dual relation between risk-sensitive control and large deviation control of maximizing the probability for out-performing a target for Markov Decision Processes. To derive the desired duality, we apply a non-linear extension of the Krein-Rutman Theorem to characterize the optimal risk-sensitive value and prove that an optimal policy exists which is stationary and deterministic. The right-hand side derivative of this value function is used to characterize the specific targets which make the duality to hold. It is proved that the optimal policy for the “out-performing” probability can be approximated by the optimal one for the risk-sensitive control. The range of the (right-hand, left-hand side) derivative of the optimal risk-sensitive value function plays an important role. Some essential differences between these two types of optimal control problems are presented.
期刊介绍:
Optimization and Engineering is a multidisciplinary journal; its primary goal is to promote the application of optimization methods in the general area of engineering sciences. We expect submissions to OPTE not only to make a significant optimization contribution but also to impact a specific engineering application.
Topics of Interest:
-Optimization: All methods and algorithms of mathematical optimization, including blackbox and derivative-free optimization, continuous optimization, discrete optimization, global optimization, linear and conic optimization, multiobjective optimization, PDE-constrained optimization & control, and stochastic optimization. Numerical and implementation issues, optimization software, benchmarking, and case studies.
-Engineering Sciences: Aerospace engineering, biomedical engineering, chemical & process engineering, civil, environmental, & architectural engineering, electrical engineering, financial engineering, geosciences, healthcare engineering, industrial & systems engineering, mechanical engineering & MDO, and robotics.