基于路径点的进化策略形策略搜索*

2021 IEEE International Conference on Robotics and Automation (ICRA) Pub Date : 2021-05-30 DOI:10.1109/ICRA48506.2021.9561607

Kiran Lekkala, L. Itti

{"title":"基于路径点的进化策略形策略搜索*","authors":"Kiran Lekkala, L. Itti","doi":"10.1109/ICRA48506.2021.9561607","DOIUrl":null,"url":null,"abstract":"In this paper, we try to improve exploration in Blackbox methods, particularly Evolution strategies (ES), when applied to Reinforcement Learning (RL) problems where intermediate waypoints/subgoals are available. Since Evolutionary strategies are highly parallelizable, instead of extracting just a scalar cumulative reward, we use the state-action pairs from the trajectories obtained during rollouts/evaluations, to learn the dynamics of the agent. The learnt dynamics are then used in the optimization procedure to speed-up training. Lastly, we show how our proposed approach is universally applicable by presenting results from experiments conducted on Carla driving and UR5 robotic arm simulators.","PeriodicalId":108312,"journal":{"name":"2021 IEEE International Conference on Robotics and Automation (ICRA)","volume":"174 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Shaped Policy Search for Evolutionary Strategies using Waypoints*\",\"authors\":\"Kiran Lekkala, L. Itti\",\"doi\":\"10.1109/ICRA48506.2021.9561607\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, we try to improve exploration in Blackbox methods, particularly Evolution strategies (ES), when applied to Reinforcement Learning (RL) problems where intermediate waypoints/subgoals are available. Since Evolutionary strategies are highly parallelizable, instead of extracting just a scalar cumulative reward, we use the state-action pairs from the trajectories obtained during rollouts/evaluations, to learn the dynamics of the agent. The learnt dynamics are then used in the optimization procedure to speed-up training. Lastly, we show how our proposed approach is universally applicable by presenting results from experiments conducted on Carla driving and UR5 robotic arm simulators.\",\"PeriodicalId\":108312,\"journal\":{\"name\":\"2021 IEEE International Conference on Robotics and Automation (ICRA)\",\"volume\":\"174 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-05-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 IEEE International Conference on Robotics and Automation (ICRA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICRA48506.2021.9561607\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE International Conference on Robotics and Automation (ICRA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICRA48506.2021.9561607","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

在本文中，我们试图改进黑箱方法的探索，特别是进化策略(ES)，当应用于强化学习(RL)问题时，中间路径点/子目标是可用的。由于进化策略是高度并行化的，我们使用在推出/评估期间获得的轨迹中的状态-动作对来学习智能体的动态，而不是仅仅提取标量累积奖励。然后在优化过程中使用学习到的动力学来加速训练。最后，我们通过展示在Carla驾驶和UR5机械臂模拟器上进行的实验结果，展示了我们提出的方法是如何普遍适用的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Shaped Policy Search for Evolutionary Strategies using Waypoints*

In this paper, we try to improve exploration in Blackbox methods, particularly Evolution strategies (ES), when applied to Reinforcement Learning (RL) problems where intermediate waypoints/subgoals are available. Since Evolutionary strategies are highly parallelizable, instead of extracting just a scalar cumulative reward, we use the state-action pairs from the trajectories obtained during rollouts/evaluations, to learn the dynamics of the agent. The learnt dynamics are then used in the optimization procedure to speed-up training. Lastly, we show how our proposed approach is universally applicable by presenting results from experiments conducted on Carla driving and UR5 robotic arm simulators.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2021 IEEE International Conference on Robotics and Automation (ICRA)

自引率

0.00%

发文量