A learning algorithm for Markov decision processes with adaptive state aggregation

Proceedings of the 39th IEEE Conference on Decision and Control (Cat. No.00CH37187) Pub Date : 2000-12-12 DOI:10.1109/CDC.2000.912220

J. Baras, V. Borkar

引用次数: 28

Abstract

We propose a simulation-based algorithm for learning good policies for a Markov decision process with unknown transition law, with aggregated states. The state aggregation itself can be adapted on a slower time scale by an auxiliary learning algorithm. Rigorous justifications are provided for both algorithms.

查看原文本刊更多论文

具有自适应状态聚合的马尔可夫决策过程学习算法

我们提出了一种基于仿真的马尔可夫决策过程学习策略的算法，该决策过程具有未知的转移律和聚合状态。状态聚合本身可以通过辅助学习算法在较慢的时间尺度上进行调整。为这两种算法提供了严格的证明。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 39th IEEE Conference on Decision and Control (Cat. No.00CH37187)

自引率

0.00%

发文量