An Advisor-Based Architecture for a Sample-Efficient Training of Autonomous Navigation Agents with Reinforcement Learning

IF 2.9 Q2 ROBOTICS

Robotics Pub Date : 2023-09-28 DOI:10.3390/robotics12050133

Rukshan Darshana Wijesinghe, Dumindu Tissera, Mihira Kasun Vithanage, Alex Xavier, Subha Fernando, Jayathu Samarawickrama

{"title":"An Advisor-Based Architecture for a Sample-Efficient Training of Autonomous Navigation Agents with Reinforcement Learning","authors":"Rukshan Darshana Wijesinghe, Dumindu Tissera, Mihira Kasun Vithanage, Alex Xavier, Subha Fernando, Jayathu Samarawickrama","doi":"10.3390/robotics12050133","DOIUrl":null,"url":null,"abstract":"Recent advancements in artificial intelligence have enabled reinforcement learning (RL) agents to exceed human-level performance in various gaming tasks. However, despite the state-of-the-art performance demonstrated by model-free RL algorithms, they suffer from high sample complexity. Hence, it is uncommon to find their applications in robotics, autonomous navigation, and self-driving, as gathering many samples is impractical in real-world hardware systems. Therefore, developing sample-efficient learning algorithms for RL agents is crucial in deploying them in real-world tasks without sacrificing performance. This paper presents an advisor-based learning algorithm, incorporating prior knowledge into the training by modifying the deep deterministic policy gradient algorithm to reduce the sample complexity. Also, we propose an effective method of employing an advisor in data collection to train autonomous navigation agents to maneuver physical platforms, minimizing the risk of collision. We analyze the performance of our methods with the support of simulation and physical experimental setups. Experiments reveal that incorporating an advisor into the training phase significantly reduces the sample complexity without compromising the agent’s performance compared to various benchmark approaches. Also, they show that the advisor’s constant involvement in the data collection process diminishes the agent’s performance, while the limited involvement makes training more effective.","PeriodicalId":37568,"journal":{"name":"Robotics","volume":"4 1","pages":"0"},"PeriodicalIF":2.9000,"publicationDate":"2023-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Robotics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3390/robotics12050133","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ROBOTICS","Score":null,"Total":0}

引用次数: 0

Abstract

Recent advancements in artificial intelligence have enabled reinforcement learning (RL) agents to exceed human-level performance in various gaming tasks. However, despite the state-of-the-art performance demonstrated by model-free RL algorithms, they suffer from high sample complexity. Hence, it is uncommon to find their applications in robotics, autonomous navigation, and self-driving, as gathering many samples is impractical in real-world hardware systems. Therefore, developing sample-efficient learning algorithms for RL agents is crucial in deploying them in real-world tasks without sacrificing performance. This paper presents an advisor-based learning algorithm, incorporating prior knowledge into the training by modifying the deep deterministic policy gradient algorithm to reduce the sample complexity. Also, we propose an effective method of employing an advisor in data collection to train autonomous navigation agents to maneuver physical platforms, minimizing the risk of collision. We analyze the performance of our methods with the support of simulation and physical experimental setups. Experiments reveal that incorporating an advisor into the training phase significantly reduces the sample complexity without compromising the agent’s performance compared to various benchmark approaches. Also, they show that the advisor’s constant involvement in the data collection process diminishes the agent’s performance, while the limited involvement makes training more effective.

查看原文本刊更多论文

基于顾问的自主导航智能体样本高效训练与强化学习

人工智能的最新进展使强化学习(RL)代理在各种游戏任务中的表现超过了人类水平。然而，尽管无模型强化学习算法具有最先进的性能，但它们的样本复杂性很高。因此，很难在机器人、自主导航和自动驾驶中找到它们的应用，因为在现实世界的硬件系统中收集许多样本是不切实际的。因此，为强化学习代理开发样本高效学习算法对于在不牺牲性能的情况下将其部署到实际任务中至关重要。本文提出了一种基于顾问的学习算法，通过修改深度确定性策略梯度算法，将先验知识引入到训练中，以降低样本复杂度。此外，我们还提出了一种有效的方法，即在数据收集中使用顾问来训练自主导航代理来操纵物理平台，从而最大限度地降低碰撞风险。我们在模拟和物理实验设置的支持下分析了我们的方法的性能。实验表明，与各种基准方法相比，将顾问纳入训练阶段显着降低了样本复杂性，而不会影响代理的性能。此外，他们还表明，顾问在数据收集过程中的持续参与会降低代理的绩效，而有限的参与会使培训更有效。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Robotics Mathematics-Control and Optimization

CiteScore

6.70

自引率

8.10%

发文量

114

审稿时长

11 weeks

期刊介绍： Robotics publishes original papers, technical reports, case studies, review papers and tutorials in all the aspects of robotics. Special Issues devoted to important topics in advanced robotics will be published from time to time. It particularly welcomes those emerging methodologies and techniques which bridge theoretical studies and applications and have significant potential for real-world applications. It provides a forum for information exchange between professionals, academicians and engineers who are working in the area of robotics, helping them to disseminate research findings and to learn from each other’s work. Suitable topics include, but are not limited to: -intelligent robotics, mechatronics, and biomimetics -novel and biologically-inspired robotics -modelling, identification and control of robotic systems -biomedical, rehabilitation and surgical robotics -exoskeletons, prosthetics and artificial organs -AI, neural networks and fuzzy logic in robotics -multimodality human-machine interaction -wireless sensor networks for robot navigation -multi-sensor data fusion and SLAM