Following Newton direction in Policy Gradient with parameter exploration

2015 International Joint Conference on Neural Networks (IJCNN) Pub Date : 2015-07-12 DOI:10.1109/IJCNN.2015.7280673

Giorgio Manganini, Matteo Pirotta, Marcello Restelli, L. Bascetta

引用次数: 5

Abstract

This paper investigates the use of second-order methods to solve Markov Decision Processes (MDPs). Despite the popularity of second-order methods in optimization literature, so far little attention has been paid to the extension of such techniques to face sequential decision problems. Here we provide a model-free Reinforcement Learning method that estimates the Newton direction by sampling directly in the parameter space. In order to compute the Newton direction we provide the formulation of the Hessian of the expected return, a technique for variance reduction in the sample-based estimation and a finite sample analysis in the case of Normal distribution. Beside discussing the theoretical properties, we empirically evaluate the method on an instructional linear-quadratic regulator and on a complex dynamical quadrotor system.

查看原文本刊更多论文

遵循牛顿方向的策略梯度与参数探索

本文研究了二阶方法在马尔可夫决策过程中的应用。尽管二阶方法在优化文献中很受欢迎，但到目前为止，很少有人关注这种技术的扩展，以面对顺序决策问题。在这里，我们提供了一种无模型的强化学习方法，该方法通过直接在参数空间中采样来估计牛顿方向。为了计算牛顿方向，我们提供了期望收益的Hessian公式，一种基于样本估计的方差减小技术以及正态分布情况下的有限样本分析。除了讨论理论性质外，我们还在一个指导性线性二次型调节器和一个复杂的动态四旋翼系统上对该方法进行了经验评价。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2015 International Joint Conference on Neural Networks (IJCNN)

自引率

0.00%

发文量