{"title":"具有风险敏感准则的马尔可夫决策过程:动态规划算子和贴现随机对策","authors":"R. Cavazos-Cadena, E. Fernández-Gaucherand","doi":"10.1109/CDC.2001.980564","DOIUrl":null,"url":null,"abstract":"We study discrete-time Markov decision processes with denumerable state space and bounded costs per stage. It is assumed that the decision maker exhibits a constant sensitivity to risk, and that the performance of a control policy is measured by a (long-run) risk-sensitive average cost criterion. Besides standard continuity-compactness conditions, the basic structural constraint on the decision model is that the transition law satisfies a simultaneous Doeblin condition. Within this framework, the main objective is to study the existence of bounded solutions to the risk-sensitive average cost optimality equation. Our main result guarantees a bounded solution to the optimality equation only if the risk sensitivity coefficient /spl lambda/ is small enough and, via a detailed example, it can be shown that such a conclusion cannot be extended to arbitrary values of /spl lambda/. Our results are in opposition to previous claims in the literature, but agree with recent results obtained via a direct probabilistic analysis. A key analysis tool developed in the paper is the definition of an appropriate operator with contractive properties, analogous to the dynamic programming operator in Bellman's equation, and a family of (value) functions with a discounted stochastic games interpretation.","PeriodicalId":131411,"journal":{"name":"Proceedings of the 40th IEEE Conference on Decision and Control (Cat. No.01CH37228)","volume":"128 4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2001-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Markov decision processes with risk-sensitive criteria: dynamic programming operators and discounted stochastic games\",\"authors\":\"R. Cavazos-Cadena, E. Fernández-Gaucherand\",\"doi\":\"10.1109/CDC.2001.980564\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We study discrete-time Markov decision processes with denumerable state space and bounded costs per stage. It is assumed that the decision maker exhibits a constant sensitivity to risk, and that the performance of a control policy is measured by a (long-run) risk-sensitive average cost criterion. Besides standard continuity-compactness conditions, the basic structural constraint on the decision model is that the transition law satisfies a simultaneous Doeblin condition. Within this framework, the main objective is to study the existence of bounded solutions to the risk-sensitive average cost optimality equation. Our main result guarantees a bounded solution to the optimality equation only if the risk sensitivity coefficient /spl lambda/ is small enough and, via a detailed example, it can be shown that such a conclusion cannot be extended to arbitrary values of /spl lambda/. Our results are in opposition to previous claims in the literature, but agree with recent results obtained via a direct probabilistic analysis. A key analysis tool developed in the paper is the definition of an appropriate operator with contractive properties, analogous to the dynamic programming operator in Bellman's equation, and a family of (value) functions with a discounted stochastic games interpretation.\",\"PeriodicalId\":131411,\"journal\":{\"name\":\"Proceedings of the 40th IEEE Conference on Decision and Control (Cat. No.01CH37228)\",\"volume\":\"128 4 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2001-12-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 40th IEEE Conference on Decision and Control (Cat. No.01CH37228)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CDC.2001.980564\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 40th IEEE Conference on Decision and Control (Cat. No.01CH37228)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CDC.2001.980564","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Markov decision processes with risk-sensitive criteria: dynamic programming operators and discounted stochastic games
We study discrete-time Markov decision processes with denumerable state space and bounded costs per stage. It is assumed that the decision maker exhibits a constant sensitivity to risk, and that the performance of a control policy is measured by a (long-run) risk-sensitive average cost criterion. Besides standard continuity-compactness conditions, the basic structural constraint on the decision model is that the transition law satisfies a simultaneous Doeblin condition. Within this framework, the main objective is to study the existence of bounded solutions to the risk-sensitive average cost optimality equation. Our main result guarantees a bounded solution to the optimality equation only if the risk sensitivity coefficient /spl lambda/ is small enough and, via a detailed example, it can be shown that such a conclusion cannot be extended to arbitrary values of /spl lambda/. Our results are in opposition to previous claims in the literature, but agree with recent results obtained via a direct probabilistic analysis. A key analysis tool developed in the paper is the definition of an appropriate operator with contractive properties, analogous to the dynamic programming operator in Bellman's equation, and a family of (value) functions with a discounted stochastic games interpretation.