基于随机网络蒸馏深度强化学习的移动机器人导航策略学习

Proceedings of the 2021 5th International Conference on Innovation in Artificial Intelligence Pub Date : 2021-03-05 DOI:10.1145/3461353.3461365

Lifan Pan, Anyi Li, Jun Ma, Jianmin Ji

{"title":"基于随机网络蒸馏深度强化学习的移动机器人导航策略学习","authors":"Lifan Pan, Anyi Li, Jun Ma, Jianmin Ji","doi":"10.1145/3461353.3461365","DOIUrl":null,"url":null,"abstract":"Learning navigation policies considers the task of training a model that can find collision-free paths for mobile robots, where various Deep Reinforcement Learning (DRL) methods have been applied with promising results. However, the natural reward function for the task is usually sparse, i.e., obtaining a penalty for the collision and a positive reward for arriving the target position, which makes it difficult to learn. In particular, for some complex navigation environments, it is hard to search a collision-free path by the random exploration, which leads to a rather slow learning speed and solutions with poor performance. In this paper, we propose a DRL based approach to train an end-to-end navigation planner, i.e, the policy neural network, that directly translates the local grid map and the relative goal of the robot into its moving actions. To handle the sparse reward problem, we augment the normal extrinsic reward from the environment with intrinsic reward signals measured by random network distillation (RND). In specific, the intrinsic reward is calculated by two different networks from RND, which encourages the agent to explore a state that has not been seen before. The experimental results show that by augmenting the reward function with intrinsic reward signals by RND, solutions with better performance can be learned more efficiently and more stably in our approach. We also deploy the trained model to a real robot, which can perform collision avoidance in navigation tasks without any parameter tuning. A video of our experiments can be found at https://youtu.be/b1GJrWfO8pw.","PeriodicalId":114871,"journal":{"name":"Proceedings of the 2021 5th International Conference on Innovation in Artificial Intelligence","volume":"72 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-03-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Learning Navigation Policies for Mobile Robots in Deep Reinforcement Learning with Random Network Distillation\",\"authors\":\"Lifan Pan, Anyi Li, Jun Ma, Jianmin Ji\",\"doi\":\"10.1145/3461353.3461365\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Learning navigation policies considers the task of training a model that can find collision-free paths for mobile robots, where various Deep Reinforcement Learning (DRL) methods have been applied with promising results. However, the natural reward function for the task is usually sparse, i.e., obtaining a penalty for the collision and a positive reward for arriving the target position, which makes it difficult to learn. In particular, for some complex navigation environments, it is hard to search a collision-free path by the random exploration, which leads to a rather slow learning speed and solutions with poor performance. In this paper, we propose a DRL based approach to train an end-to-end navigation planner, i.e, the policy neural network, that directly translates the local grid map and the relative goal of the robot into its moving actions. To handle the sparse reward problem, we augment the normal extrinsic reward from the environment with intrinsic reward signals measured by random network distillation (RND). In specific, the intrinsic reward is calculated by two different networks from RND, which encourages the agent to explore a state that has not been seen before. The experimental results show that by augmenting the reward function with intrinsic reward signals by RND, solutions with better performance can be learned more efficiently and more stably in our approach. We also deploy the trained model to a real robot, which can perform collision avoidance in navigation tasks without any parameter tuning. A video of our experiments can be found at https://youtu.be/b1GJrWfO8pw.\",\"PeriodicalId\":114871,\"journal\":{\"name\":\"Proceedings of the 2021 5th International Conference on Innovation in Artificial Intelligence\",\"volume\":\"72 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-03-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2021 5th International Conference on Innovation in Artificial Intelligence\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3461353.3461365\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2021 5th International Conference on Innovation in Artificial Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3461353.3461365","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

摘要

学习导航策略考虑的任务是训练一个可以为移动机器人找到无碰撞路径的模型，其中各种深度强化学习(DRL)方法已经应用并取得了很好的结果。然而，任务的自然奖励函数通常是稀疏的，即对碰撞获得惩罚，对到达目标位置获得正奖励，这使得学习变得困难。特别是对于一些复杂的导航环境，通过随机探索难以搜索到无碰撞路径，导致学习速度较慢，解决方案性能较差。在本文中，我们提出了一种基于DRL的方法来训练端到端导航规划器，即策略神经网络，它直接将局部网格地图和机器人的相对目标转化为机器人的移动动作。为了解决稀疏奖励问题，我们用随机网络蒸馏(RND)测量的内在奖励信号来增强来自环境的正常外部奖励信号。具体来说，内在奖励是由两个来自RND的不同网络计算的，这鼓励agent探索以前从未见过的状态。实验结果表明，通过RND增加奖励函数和固有奖励信号，我们的方法可以更有效、更稳定地学习到性能更好的解。我们还将训练好的模型部署到一个真实的机器人上，该机器人可以在不进行任何参数调整的情况下完成导航任务中的避碰。我们的实验视频可以在https://youtu.be/b1GJrWfO8pw上找到。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Learning Navigation Policies for Mobile Robots in Deep Reinforcement Learning with Random Network Distillation

Learning navigation policies considers the task of training a model that can find collision-free paths for mobile robots, where various Deep Reinforcement Learning (DRL) methods have been applied with promising results. However, the natural reward function for the task is usually sparse, i.e., obtaining a penalty for the collision and a positive reward for arriving the target position, which makes it difficult to learn. In particular, for some complex navigation environments, it is hard to search a collision-free path by the random exploration, which leads to a rather slow learning speed and solutions with poor performance. In this paper, we propose a DRL based approach to train an end-to-end navigation planner, i.e, the policy neural network, that directly translates the local grid map and the relative goal of the robot into its moving actions. To handle the sparse reward problem, we augment the normal extrinsic reward from the environment with intrinsic reward signals measured by random network distillation (RND). In specific, the intrinsic reward is calculated by two different networks from RND, which encourages the agent to explore a state that has not been seen before. The experimental results show that by augmenting the reward function with intrinsic reward signals by RND, solutions with better performance can be learned more efficiently and more stably in our approach. We also deploy the trained model to a real robot, which can perform collision avoidance in navigation tasks without any parameter tuning. A video of our experiments can be found at https://youtu.be/b1GJrWfO8pw.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 2021 5th International Conference on Innovation in Artificial Intelligence

自引率

0.00%

发文量