Uncertainty handling CMA-ES for reinforcement learning

Proceedings of the 11th Annual conference on Genetic and evolutionary computation Pub Date : 2009-07-08 DOI:10.1145/1569901.1570064

V. Heidrich-Meisner, C. Igel

引用次数: 14

Abstract

The covariance matrix adaptation evolution strategy (CMAES) has proven to be a powerful method for reinforcement learning (RL). Recently, the CMA-ES has been augmented with an adaptive uncertainty handling mechanism. Because uncertainty is a typical property of RL problems this new algorithm, termed UH-CMA-ES, is promising for RL. The UH-CMA-ES dynamically adjusts the number of episodes considered in each evaluation of a policy. It controls the signal to noise ratio such that it is just high enough for a sufficiently good ranking of candidate policies, which in turn allows the evolutionary learning to find better solutions. This significantly increases the learning speed as well as the robustness without impairing the quality of the final solutions. We evaluate the UH-CMA-ES on fully and partially observable Markov decision processes with random start states and noisy observations. A canonical natural policy gradient method and random search serve as a baseline for comparison.

查看原文本刊更多论文

用于强化学习的不确定性处理CMA-ES

协方差矩阵自适应进化策略(CMAES)已被证明是一种强大的强化学习方法。近年来，CMA-ES增加了自适应不确定性处理机制。由于不确定性是强化学习问题的一个典型特征，这种新的算法被称为UH-CMA-ES，在强化学习中很有前景。UH-CMA-ES动态调整每次政策评估中考虑的事件数。它控制信噪比，使其足够高，足以对候选策略进行足够好的排序，这反过来又允许进化学习找到更好的解决方案。这大大提高了学习速度和鲁棒性，同时又不影响最终解的质量。我们在完全可观察和部分可观察的马尔可夫决策过程上对UH-CMA-ES进行了评估。一个典型的自然策略梯度方法和随机搜索作为比较的基线。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 11th Annual conference on Genetic and evolutionary computation

自引率

0.00%

发文量