基于稀疏奖励扩展连续机器人任务的数据高效深度强化学习方法

2021 IEEE International Conference on Real-time Computing and Robotics (RCAR) Pub Date : 2021-07-15 DOI:10.1109/RCAR52367.2021.9517647

Junkai Ren, Yichuan Zhang, Yujun Zeng, Yixing Lan

{"title":"基于稀疏奖励扩展连续机器人任务的数据高效深度强化学习方法","authors":"Junkai Ren, Yichuan Zhang, Yujun Zeng, Yixing Lan","doi":"10.1109/RCAR52367.2021.9517647","DOIUrl":null,"url":null,"abstract":"Dealing with the robotic continuous control problem with sparse rewards is a longstanding challenge in deep reinforcement learning (RL). While existing DRL algorithms have demonstrated great progress in learning policies from visual observations, learning effective policies still requires an impractical number of real-world data samples. Moreover, some robotic tasks are naturally specified with sparse rewards, which makes the precious data inefficient and slows down the learning process, making DRL infeasible. In addition, manually shaping reward functions is a complex work because it needs specific domain knowledge and human intervention. To alleviate the issue, this paper proposes a model-free, off-policy RL approach named TD3MHER, to learn the manipulating policy for continuous robotic tasks with sparse rewards. To be specific, TD3MHER utilizes Twin Delayed Deep Deterministic policy gradient algorithm (TD3) and Model-driven Hindsight Experience Replay (MHER) to achieve highly sample-efficient training property. Because while the agent is learning the policy, TD3MHER could also help it to learn the potation physical model of the robot which is helpful to solve the task, and it does not necessitate any novel robot-environment interactions. The performance of TD3MHER is assessed on a simulated robotic task using a 7-DOF manipulator to compare the proposed technique to a previous DRL algorithm and to verify the usefulness of our method. Results of the experiments on simulated robotic task show that the proposed approach is capable of successfully utilizing previously store samples with sparse rewards, and obtain a faster learning speed.","PeriodicalId":232892,"journal":{"name":"2021 IEEE International Conference on Real-time Computing and Robotics (RCAR)","volume":"58 ","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Data-efficient Deep Reinforcement Learning Method Toward Scaling Continuous Robotic Task with Sparse Rewards\",\"authors\":\"Junkai Ren, Yichuan Zhang, Yujun Zeng, Yixing Lan\",\"doi\":\"10.1109/RCAR52367.2021.9517647\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Dealing with the robotic continuous control problem with sparse rewards is a longstanding challenge in deep reinforcement learning (RL). While existing DRL algorithms have demonstrated great progress in learning policies from visual observations, learning effective policies still requires an impractical number of real-world data samples. Moreover, some robotic tasks are naturally specified with sparse rewards, which makes the precious data inefficient and slows down the learning process, making DRL infeasible. In addition, manually shaping reward functions is a complex work because it needs specific domain knowledge and human intervention. To alleviate the issue, this paper proposes a model-free, off-policy RL approach named TD3MHER, to learn the manipulating policy for continuous robotic tasks with sparse rewards. To be specific, TD3MHER utilizes Twin Delayed Deep Deterministic policy gradient algorithm (TD3) and Model-driven Hindsight Experience Replay (MHER) to achieve highly sample-efficient training property. Because while the agent is learning the policy, TD3MHER could also help it to learn the potation physical model of the robot which is helpful to solve the task, and it does not necessitate any novel robot-environment interactions. The performance of TD3MHER is assessed on a simulated robotic task using a 7-DOF manipulator to compare the proposed technique to a previous DRL algorithm and to verify the usefulness of our method. Results of the experiments on simulated robotic task show that the proposed approach is capable of successfully utilizing previously store samples with sparse rewards, and obtain a faster learning speed.\",\"PeriodicalId\":232892,\"journal\":{\"name\":\"2021 IEEE International Conference on Real-time Computing and Robotics (RCAR)\",\"volume\":\"58 \",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-07-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 IEEE International Conference on Real-time Computing and Robotics (RCAR)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/RCAR52367.2021.9517647\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE International Conference on Real-time Computing and Robotics (RCAR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/RCAR52367.2021.9517647","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

处理具有稀疏奖励的机器人连续控制问题是深度强化学习(RL)中一个长期存在的挑战。虽然现有的DRL算法在从视觉观察中学习策略方面取得了很大的进步，但学习有效的策略仍然需要大量的现实世界数据样本。此外，一些机器人任务自然地指定了稀疏的奖励，这使得宝贵的数据效率低下，减慢了学习过程，使得DRL不可行。此外，人工塑造奖励函数是一项复杂的工作，因为它需要特定的领域知识和人为干预。为了解决这一问题，本文提出了一种无模型、无策略的强化学习方法TD3MHER，用于学习具有稀疏奖励的连续机器人任务的操作策略。具体来说，TD3MHER利用双延迟深度确定性策略梯度算法(TD3)和模型驱动的后见经验回放(MHER)来实现高样本效率的训练特性。因为在智能体学习策略的同时，TD3MHER还可以帮助它学习机器人的潜在物理模型，这有助于解决任务，并且不需要任何新的机器人与环境的交互。在模拟机器人任务中，使用7自由度机械臂评估TD3MHER的性能，将所提出的技术与之前的DRL算法进行比较，并验证我们方法的实用性。模拟机器人任务的实验结果表明，该方法能够成功地利用先前存储的具有稀疏奖励的样本，并获得更快的学习速度。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Data-efficient Deep Reinforcement Learning Method Toward Scaling Continuous Robotic Task with Sparse Rewards

Dealing with the robotic continuous control problem with sparse rewards is a longstanding challenge in deep reinforcement learning (RL). While existing DRL algorithms have demonstrated great progress in learning policies from visual observations, learning effective policies still requires an impractical number of real-world data samples. Moreover, some robotic tasks are naturally specified with sparse rewards, which makes the precious data inefficient and slows down the learning process, making DRL infeasible. In addition, manually shaping reward functions is a complex work because it needs specific domain knowledge and human intervention. To alleviate the issue, this paper proposes a model-free, off-policy RL approach named TD3MHER, to learn the manipulating policy for continuous robotic tasks with sparse rewards. To be specific, TD3MHER utilizes Twin Delayed Deep Deterministic policy gradient algorithm (TD3) and Model-driven Hindsight Experience Replay (MHER) to achieve highly sample-efficient training property. Because while the agent is learning the policy, TD3MHER could also help it to learn the potation physical model of the robot which is helpful to solve the task, and it does not necessitate any novel robot-environment interactions. The performance of TD3MHER is assessed on a simulated robotic task using a 7-DOF manipulator to compare the proposed technique to a previous DRL algorithm and to verify the usefulness of our method. Results of the experiments on simulated robotic task show that the proposed approach is capable of successfully utilizing previously store samples with sparse rewards, and obtain a faster learning speed.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2021 IEEE International Conference on Real-time Computing and Robotics (RCAR)

自引率

0.00%

发文量