Finite horizon analysis of infinite CTMDPs

IEEE/IFIP International Conference on Dependable Systems and Networks (DSN 2012) Pub Date : 2012-06-25 DOI:10.1109/DSN.2012.6263929

P. Buchholz

引用次数: 3

Abstract

Continuous Time Markov Decision Processes (CTMDPs) are used to describe optimization problems in many applications including system maintenance and control. Often one is interested in a control strategy or policy to optimize the gain of a system over a finite interval which is denoted as finite horizon. The computation of an ε-optimal policy, i.e., a policy that reaches the optimal gain up to some small ε, is often hindered by state space explosion which means that state spaces of realistic models can be very large or even infinite. The paper presents new algorithms to compute approximately optimal policies for CTMDPs with large or infinite state spaces. The new approach allows one to compute bounds on the achievable gain and a policy to reach the lower bound using a variant of uniformization on a finite subset of the state space. It is also shown how the approach can be applied to models with unbounded rewards or transition rates for which uniformization cannot be applied per se.

查看原文本刊更多论文

无限CTMDPs的有限视界分析

连续时间马尔可夫决策过程(ctmdp)用于描述系统维护和控制等许多应用中的优化问题。通常人们感兴趣的是在有限区间内优化系统增益的控制策略或策略，该区间表示为有限视界。ε-最优策略的计算常常受到状态空间爆炸的阻碍，这意味着现实模型的状态空间可能非常大甚至无限。本文提出了计算具有大或无限状态空间的ctmdp近似最优策略的新算法。新方法允许计算可实现增益的边界，并在状态空间的有限子集上使用统一化的变体来计算达到下界的策略。它还显示了如何将该方法应用于具有无界奖励或过渡率的模型，这些模型本身不能应用均匀化。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE/IFIP International Conference on Dependable Systems and Networks (DSN 2012)

自引率

0.00%

发文量