{"title":"基于稀疏奖励的高速示范学习机械臂运动规划","authors":"Guoyu Zuo, Jiahao Lu, Tingting Pan","doi":"10.1109/ROBIO.2018.8665328","DOIUrl":null,"url":null,"abstract":"This paper proposed a high speed learning from demonstrations (LfD) method for sparse reward based motion planning problem of manipulator by using hindsight experience replay (HER) mechanism and deep deterministic policy gradient (DDPG) method. First, a demonstrations replay buffer and an agent exploration replay buffer are created for storing experience data, and the hindsight experience replay mechanism is subsequently used to acquire the experience data from the two replay buffers. Then, the deep deterministic policy gradient method is used to learn the experience data and finally fulfil the manipulator motion planning tasks under the sparse reward. Last, experiments on the pushing and pick-and-place tasks were conducted in the robotics environment in the gym. Results show that the training speed is increased to at least 10 times as compared to the deep deterministic policy gradient method without demonstrations data. In addition, the proposed method can effectively utilize the sparse reward, and the agent can quickly complete the task even under the low success rate of demonstrations data.","PeriodicalId":417415,"journal":{"name":"2018 IEEE International Conference on Robotics and Biomimetics (ROBIO)","volume":"235 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Sparse Reward Based Manipulator Motion Planning by Using High Speed Learning from Demonstrations\",\"authors\":\"Guoyu Zuo, Jiahao Lu, Tingting Pan\",\"doi\":\"10.1109/ROBIO.2018.8665328\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper proposed a high speed learning from demonstrations (LfD) method for sparse reward based motion planning problem of manipulator by using hindsight experience replay (HER) mechanism and deep deterministic policy gradient (DDPG) method. First, a demonstrations replay buffer and an agent exploration replay buffer are created for storing experience data, and the hindsight experience replay mechanism is subsequently used to acquire the experience data from the two replay buffers. Then, the deep deterministic policy gradient method is used to learn the experience data and finally fulfil the manipulator motion planning tasks under the sparse reward. Last, experiments on the pushing and pick-and-place tasks were conducted in the robotics environment in the gym. Results show that the training speed is increased to at least 10 times as compared to the deep deterministic policy gradient method without demonstrations data. In addition, the proposed method can effectively utilize the sparse reward, and the agent can quickly complete the task even under the low success rate of demonstrations data.\",\"PeriodicalId\":417415,\"journal\":{\"name\":\"2018 IEEE International Conference on Robotics and Biomimetics (ROBIO)\",\"volume\":\"235 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 IEEE International Conference on Robotics and Biomimetics (ROBIO)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ROBIO.2018.8665328\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE International Conference on Robotics and Biomimetics (ROBIO)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ROBIO.2018.8665328","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Sparse Reward Based Manipulator Motion Planning by Using High Speed Learning from Demonstrations
This paper proposed a high speed learning from demonstrations (LfD) method for sparse reward based motion planning problem of manipulator by using hindsight experience replay (HER) mechanism and deep deterministic policy gradient (DDPG) method. First, a demonstrations replay buffer and an agent exploration replay buffer are created for storing experience data, and the hindsight experience replay mechanism is subsequently used to acquire the experience data from the two replay buffers. Then, the deep deterministic policy gradient method is used to learn the experience data and finally fulfil the manipulator motion planning tasks under the sparse reward. Last, experiments on the pushing and pick-and-place tasks were conducted in the robotics environment in the gym. Results show that the training speed is increased to at least 10 times as compared to the deep deterministic policy gradient method without demonstrations data. In addition, the proposed method can effectively utilize the sparse reward, and the agent can quickly complete the task even under the low success rate of demonstrations data.