Policy Gradient Methods for Robotics

2006 IEEE/RSJ International Conference on Intelligent Robots and Systems Pub Date : 2006-10-01 DOI:10.1109/IROS.2006.282564

Jan Peters, S. Schaal

引用次数: 575

Abstract

The acquisition and improvement of motor skills and control policies for robotics from trial and error is of essential importance if robots should ever leave precisely pre-structured environments. However, to date only few existing reinforcement learning methods have been scaled into the domains of high-dimensional robots such as manipulator, legged or humanoid robots. Policy gradient methods remain one of the few exceptions and have found a variety of applications. Nevertheless, the application of such methods is not without peril if done in an uninformed manner. In this paper, we give an overview on learning with policy gradient methods for robotics with a strong focus on recent advances in the field. We outline previous applications to robotics and show how the most recently developed methods can significantly improve learning performance. Finally, we evaluate our most promising algorithm in the application of hitting a baseball with an anthropomorphic arm

查看原文本刊更多论文

机器人的策略梯度方法

如果机器人要离开精确的预结构环境，那么从试验和错误中获得和改进机器人的运动技能和控制策略是至关重要的。然而，迄今为止，只有少数现有的强化学习方法已经扩展到高维机器人领域，如机械手，腿或人形机器人。策略梯度方法仍然是少数例外之一，并且已经找到了各种应用。然而，如果在不知情的情况下应用这些方法并非没有危险。在本文中，我们概述了使用策略梯度方法学习机器人技术，重点关注该领域的最新进展。我们概述了机器人技术之前的应用，并展示了最近开发的方法如何显著提高学习性能。最后，我们评估了我们最有前途的算法在用拟人手臂打棒球的应用

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2006 IEEE/RSJ International Conference on Intelligent Robots and Systems

自引率

0.00%

发文量