Enhancing the episodic natural actor-critic algorithm by a regularisation term to stabilize learning of control structures

2011 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL) Pub Date : 2011-04-11 DOI:10.1109/ADPRL.2011.5967352

A. Witsch, R. Reichle, K. Geihs, S. Lange, Martin A. Riedmiller

引用次数: 0

Abstract

Incomplete or imprecise models of control systems make it difficult to find an appropriate structure and parameter set for a corresponding control policy. These problems are addressed by reinforcement learning algorithms like policy gradient methods. We describe how to stabilise the policy gradient descent by introducing a regularisation term to enhance the episodic natural actor-critic approach. This allows a more policy independent usage.

查看原文本刊更多论文

通过正则化项增强情景自然演员-评论家算法以稳定控制结构的学习

控制系统的不完整或不精确的模型使得为相应的控制策略找到合适的结构和参数集变得困难。这些问题是通过强化学习算法如策略梯度方法来解决的。我们描述了如何通过引入正则化术语来稳定政策梯度下降，以增强情景自然行为者批评方法。这允许更独立于策略的使用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2011 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL)

自引率

0.00%

发文量