The policy gradient estimation of continuous-time hidden Markov decision processes

2005 IEEE International Conference on Information Acquisition Pub Date : 1900-01-01 DOI:10.1109/ICIA.2005.1635101

Liao Yanjie, Yin Bao-qun, Xi Hongsheng

引用次数: 1

Abstract

Recently, gradient based methods have received much attention to optimize some dynamic systems with hidden information, such as routing problems of robotic systems. In this paper, we presented a process - continuous time hidden Markov decision process (CTHMDP), which can be used to model the robotic systems. For this process, the problem of policy gradient estimation is studied. Firstly, an approximation formula to the gradient is presented, then by using the uniformization method, we introduce an algorithm, which can be considered as an extension of gradient of partially observable Markov decision process (GPOMDP) algorithm to the continue time model. Finally, the convergence and error bound of the algorithm are considered.

查看原文本刊更多论文

连续时间隐马尔可夫决策过程策略梯度估计

近年来，基于梯度的动态系统优化方法受到了广泛的关注，如机器人系统的路径问题。本文提出了一种可用于机器人系统建模的过程-连续时间隐马尔可夫决策过程(CTHMDP)。针对这一过程，研究了策略梯度估计问题。首先给出了梯度的近似公式，然后利用均匀化方法引入了一种算法，该算法可以看作是部分可观察马尔可夫决策过程(GPOMDP)算法的梯度对连续时间模型的扩展。最后，对算法的收敛性和误差界进行了分析。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2005 IEEE International Conference on Information Acquisition

自引率

0.00%

发文量