使用Actor-Critic架构开发基于行为的机器人操作

2021 8th International Conference on Signal Processing and Integrated Networks (SPIN) Pub Date : 2021-08-26 DOI:10.1109/SPIN52536.2021.9566102

Priya Shukla, Madhurjya Pegu, G. Nandi

{"title":"使用Actor-Critic架构开发基于行为的机器人操作","authors":"Priya Shukla, Madhurjya Pegu, G. Nandi","doi":"10.1109/SPIN52536.2021.9566102","DOIUrl":null,"url":null,"abstract":"Developing behavior based robotic manipulation is a very challenging but necessary task to be solved, especially for humanoid and social robots. Fundamental robotic tasks such as grasping, pick and place, trajectory following are at present solved using conventional forward and inverse kinematics (IK), dynamics and trajectory planning, whereas we learn these complex tasks using past experiences. In this paper, we explore developing behavior based robotic manipulation using reinforcement learning, more specifically learning directly from experiences through interactions with the real world and without knowing the transition model of the environment. Here, we propose a multi agent paradigm to gather experiences from multiple environments in parallel along with a model for populating new generation of agents using Evolutionary Actor-Critic Algorithm (EACA). The agents are of actor-critic architecture and both of them comprises of general purpose neural networks. The actor-critic architecture enables the model to perform well both in high dimensional state space and high dimensional action space which is very crucial for all robotic applications. The proposed algorithm is benchmarked with respect to different multi agent paradigm but keeping the agent’s architecture same. Reinforcement learning, being highly data intensive, requires the use of the CPU and GPU cores to be done judiciously for sampling the environment as well as for training, the details of which have been described here. We have run rigorous experiments for learning joint trajectories on the open gym based KUKA arm manipulator, where our proposed method achieves learning stability within 300 episodes, as compared to the state-of-the-art actor-critic and Advanced Asynchronous Actor-Critic (A3C) algorithms both of which take more than 1000 episodes for learning the same task, showing the effectiveness of our proposed model.","PeriodicalId":343177,"journal":{"name":"2021 8th International Conference on Signal Processing and Integrated Networks (SPIN)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Development of Behavior based Robot manipulation using Actor-Critic architecture\",\"authors\":\"Priya Shukla, Madhurjya Pegu, G. Nandi\",\"doi\":\"10.1109/SPIN52536.2021.9566102\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Developing behavior based robotic manipulation is a very challenging but necessary task to be solved, especially for humanoid and social robots. Fundamental robotic tasks such as grasping, pick and place, trajectory following are at present solved using conventional forward and inverse kinematics (IK), dynamics and trajectory planning, whereas we learn these complex tasks using past experiences. In this paper, we explore developing behavior based robotic manipulation using reinforcement learning, more specifically learning directly from experiences through interactions with the real world and without knowing the transition model of the environment. Here, we propose a multi agent paradigm to gather experiences from multiple environments in parallel along with a model for populating new generation of agents using Evolutionary Actor-Critic Algorithm (EACA). The agents are of actor-critic architecture and both of them comprises of general purpose neural networks. The actor-critic architecture enables the model to perform well both in high dimensional state space and high dimensional action space which is very crucial for all robotic applications. The proposed algorithm is benchmarked with respect to different multi agent paradigm but keeping the agent’s architecture same. Reinforcement learning, being highly data intensive, requires the use of the CPU and GPU cores to be done judiciously for sampling the environment as well as for training, the details of which have been described here. We have run rigorous experiments for learning joint trajectories on the open gym based KUKA arm manipulator, where our proposed method achieves learning stability within 300 episodes, as compared to the state-of-the-art actor-critic and Advanced Asynchronous Actor-Critic (A3C) algorithms both of which take more than 1000 episodes for learning the same task, showing the effectiveness of our proposed model.\",\"PeriodicalId\":343177,\"journal\":{\"name\":\"2021 8th International Conference on Signal Processing and Integrated Networks (SPIN)\",\"volume\":\"30 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-08-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 8th International Conference on Signal Processing and Integrated Networks (SPIN)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SPIN52536.2021.9566102\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 8th International Conference on Signal Processing and Integrated Networks (SPIN)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SPIN52536.2021.9566102","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

开发基于行为的机器人操作是一个非常具有挑战性但又必须解决的任务，特别是对于类人机器人和社交机器人。基本的机器人任务，如抓取、拾取和放置、轨迹跟踪，目前是使用传统的正逆运动学(IK)、动力学和轨迹规划来解决的，而我们使用过去的经验来学习这些复杂的任务。在本文中，我们探索了使用强化学习开发基于行为的机器人操作，更具体地说，通过与现实世界的交互直接从经验中学习，而不知道环境的过渡模型。在这里，我们提出了一个多智能体范式，以并行地从多个环境中收集经验，以及一个使用进化行为者批评算法(EACA)填充新一代智能体的模型。智能体是演员-评论家结构，它们都由通用神经网络组成。actor-critic体系结构使模型能够在高维状态空间和高维动作空间中表现良好，这对所有机器人应用都是至关重要的。该算法在保持智能体结构不变的前提下，对不同的多智能体范式进行了基准测试。强化学习是高度数据密集型的，需要明智地使用CPU和GPU内核来采样环境和训练，这里描述了这些细节。我们在基于开放式健身房的KUKA手臂机械臂上进行了严格的关节轨迹学习实验，与最先进的actor-critic和高级异步actor-critic (A3C)算法相比，我们提出的方法在300集内实现了学习稳定性，这两种算法都需要超过1000集来学习相同的任务，显示了我们提出的模型的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Development of Behavior based Robot manipulation using Actor-Critic architecture

Developing behavior based robotic manipulation is a very challenging but necessary task to be solved, especially for humanoid and social robots. Fundamental robotic tasks such as grasping, pick and place, trajectory following are at present solved using conventional forward and inverse kinematics (IK), dynamics and trajectory planning, whereas we learn these complex tasks using past experiences. In this paper, we explore developing behavior based robotic manipulation using reinforcement learning, more specifically learning directly from experiences through interactions with the real world and without knowing the transition model of the environment. Here, we propose a multi agent paradigm to gather experiences from multiple environments in parallel along with a model for populating new generation of agents using Evolutionary Actor-Critic Algorithm (EACA). The agents are of actor-critic architecture and both of them comprises of general purpose neural networks. The actor-critic architecture enables the model to perform well both in high dimensional state space and high dimensional action space which is very crucial for all robotic applications. The proposed algorithm is benchmarked with respect to different multi agent paradigm but keeping the agent’s architecture same. Reinforcement learning, being highly data intensive, requires the use of the CPU and GPU cores to be done judiciously for sampling the environment as well as for training, the details of which have been described here. We have run rigorous experiments for learning joint trajectories on the open gym based KUKA arm manipulator, where our proposed method achieves learning stability within 300 episodes, as compared to the state-of-the-art actor-critic and Advanced Asynchronous Actor-Critic (A3C) algorithms both of which take more than 1000 episodes for learning the same task, showing the effectiveness of our proposed model.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2021 8th International Conference on Signal Processing and Integrated Networks (SPIN)

自引率

0.00%

发文量