Markov decision processes with risk-sensitive criteria: dynamic programming operators and discounted stochastic games

Proceedings of the 40th IEEE Conference on Decision and Control (Cat. No.01CH37228) Pub Date : 2001-12-04 DOI:10.1109/CDC.2001.980564

R. Cavazos-Cadena, E. Fernández-Gaucherand

{"title":"Markov decision processes with risk-sensitive criteria: dynamic programming operators and discounted stochastic games","authors":"R. Cavazos-Cadena, E. Fernández-Gaucherand","doi":"10.1109/CDC.2001.980564","DOIUrl":null,"url":null,"abstract":"We study discrete-time Markov decision processes with denumerable state space and bounded costs per stage. It is assumed that the decision maker exhibits a constant sensitivity to risk, and that the performance of a control policy is measured by a (long-run) risk-sensitive average cost criterion. Besides standard continuity-compactness conditions, the basic structural constraint on the decision model is that the transition law satisfies a simultaneous Doeblin condition. Within this framework, the main objective is to study the existence of bounded solutions to the risk-sensitive average cost optimality equation. Our main result guarantees a bounded solution to the optimality equation only if the risk sensitivity coefficient /spl lambda/ is small enough and, via a detailed example, it can be shown that such a conclusion cannot be extended to arbitrary values of /spl lambda/. Our results are in opposition to previous claims in the literature, but agree with recent results obtained via a direct probabilistic analysis. A key analysis tool developed in the paper is the definition of an appropriate operator with contractive properties, analogous to the dynamic programming operator in Bellman's equation, and a family of (value) functions with a discounted stochastic games interpretation.","PeriodicalId":131411,"journal":{"name":"Proceedings of the 40th IEEE Conference on Decision and Control (Cat. No.01CH37228)","volume":"128 4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2001-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 40th IEEE Conference on Decision and Control (Cat. No.01CH37228)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CDC.2001.980564","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

Abstract

We study discrete-time Markov decision processes with denumerable state space and bounded costs per stage. It is assumed that the decision maker exhibits a constant sensitivity to risk, and that the performance of a control policy is measured by a (long-run) risk-sensitive average cost criterion. Besides standard continuity-compactness conditions, the basic structural constraint on the decision model is that the transition law satisfies a simultaneous Doeblin condition. Within this framework, the main objective is to study the existence of bounded solutions to the risk-sensitive average cost optimality equation. Our main result guarantees a bounded solution to the optimality equation only if the risk sensitivity coefficient /spl lambda/ is small enough and, via a detailed example, it can be shown that such a conclusion cannot be extended to arbitrary values of /spl lambda/. Our results are in opposition to previous claims in the literature, but agree with recent results obtained via a direct probabilistic analysis. A key analysis tool developed in the paper is the definition of an appropriate operator with contractive properties, analogous to the dynamic programming operator in Bellman's equation, and a family of (value) functions with a discounted stochastic games interpretation.

查看原文本刊更多论文

具有风险敏感准则的马尔可夫决策过程:动态规划算子和贴现随机对策

研究了状态空间有限且每阶段代价有限的离散马尔可夫决策过程。假设决策者对风险表现出恒定的敏感性，并且控制政策的绩效是通过(长期)风险敏感的平均成本标准来衡量的。除了标准的连续紧性条件外，决策模型的基本结构约束是过渡律同时满足Doeblin条件。在此框架内，主要目的是研究风险敏感平均成本最优方程的有界解的存在性。我们的主要结果保证了最优性方程只有在风险敏感性系数/spl lambda/足够小时才有有界解，并通过一个详细的例子表明，这样的结论不能推广到/spl lambda/的任意值。我们的结果与文献中先前的说法相反，但同意最近通过直接概率分析获得的结果。本文开发的一个关键分析工具是具有压缩性质的适当算子的定义，类似于Bellman方程中的动态规划算子，以及具有贴现随机对策解释的(值)函数族。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 40th IEEE Conference on Decision and Control (Cat. No.01CH37228)

自引率

0.00%

发文量