Shaped Policy Search for Evolutionary Strategies using Waypoints*

2021 IEEE International Conference on Robotics and Automation (ICRA) Pub Date : 2021-05-30 DOI:10.1109/ICRA48506.2021.9561607

Kiran Lekkala, L. Itti

引用次数: 1

Abstract

In this paper, we try to improve exploration in Blackbox methods, particularly Evolution strategies (ES), when applied to Reinforcement Learning (RL) problems where intermediate waypoints/subgoals are available. Since Evolutionary strategies are highly parallelizable, instead of extracting just a scalar cumulative reward, we use the state-action pairs from the trajectories obtained during rollouts/evaluations, to learn the dynamics of the agent. The learnt dynamics are then used in the optimization procedure to speed-up training. Lastly, we show how our proposed approach is universally applicable by presenting results from experiments conducted on Carla driving and UR5 robotic arm simulators.

查看原文本刊更多论文

基于路径点的进化策略形策略搜索*

在本文中，我们试图改进黑箱方法的探索，特别是进化策略(ES)，当应用于强化学习(RL)问题时，中间路径点/子目标是可用的。由于进化策略是高度并行化的，我们使用在推出/评估期间获得的轨迹中的状态-动作对来学习智能体的动态，而不是仅仅提取标量累积奖励。然后在优化过程中使用学习到的动力学来加速训练。最后，我们通过展示在Carla驾驶和UR5机械臂模拟器上进行的实验结果，展示了我们提出的方法是如何普遍适用的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2021 IEEE International Conference on Robotics and Automation (ICRA)

自引率

0.00%

发文量