Bayesian controller fusion: Leveraging control priors in deep reinforcement learning for robotics

IF 7.5 1区计算机科学 Q1 ROBOTICS

International Journal of Robotics Research Pub Date : 2021-07-21 DOI:10.1177/02783649231167210

Krishan Rana, Vibhavari Dasagi, Jesse Haviland, Ben Talbot, Michael Milford, N. Sunderhauf

{"title":"Bayesian controller fusion: Leveraging control priors in deep reinforcement learning for robotics","authors":"Krishan Rana, Vibhavari Dasagi, Jesse Haviland, Ben Talbot, Michael Milford, N. Sunderhauf","doi":"10.1177/02783649231167210","DOIUrl":null,"url":null,"abstract":"We present Bayesian Controller Fusion (BCF): a hybrid control strategy that combines the strengths of traditional hand-crafted controllers and model-free deep reinforcement learning (RL). BCF thrives in the robotics domain, where reliable but suboptimal control priors exist for many tasks, but RL from scratch remains unsafe and data-inefficient. By fusing uncertainty-aware distributional outputs from each system, BCF arbitrates control between them, exploiting their respective strengths. We study BCF on two real-world robotics tasks involving navigation in a vast and long-horizon environment, and a complex reaching task that involves manipulability maximisation. For both these domains, simple handcrafted controllers exist that can solve the task at hand in a risk-averse manner but do not necessarily exhibit the optimal solution given limitations in analytical modelling, controller miscalibration and task variation. As exploration is naturally guided by the prior in the early stages of training, BCF accelerates learning, while substantially improving beyond the performance of the control prior, as the policy gains more experience. More importantly, given the risk-aversity of the control prior, BCF ensures safe exploration and deployment, where the control prior naturally dominates the action distribution in states unknown to the policy. We additionally show BCF’s applicability to the zero-shot sim-to-real setting and its ability to deal with out-of-distribution states in the real world. BCF is a promising approach towards combining the complementary strengths of deep RL and traditional robotic control, surpassing what either can achieve independently. The code and supplementary video material are made publicly available at https://krishanrana.github.io/bcf.","PeriodicalId":54942,"journal":{"name":"International Journal of Robotics Research","volume":"42 1","pages":"123 - 146"},"PeriodicalIF":7.5000,"publicationDate":"2021-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Robotics Research","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1177/02783649231167210","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ROBOTICS","Score":null,"Total":0}

引用次数: 11

Abstract

We present Bayesian Controller Fusion (BCF): a hybrid control strategy that combines the strengths of traditional hand-crafted controllers and model-free deep reinforcement learning (RL). BCF thrives in the robotics domain, where reliable but suboptimal control priors exist for many tasks, but RL from scratch remains unsafe and data-inefficient. By fusing uncertainty-aware distributional outputs from each system, BCF arbitrates control between them, exploiting their respective strengths. We study BCF on two real-world robotics tasks involving navigation in a vast and long-horizon environment, and a complex reaching task that involves manipulability maximisation. For both these domains, simple handcrafted controllers exist that can solve the task at hand in a risk-averse manner but do not necessarily exhibit the optimal solution given limitations in analytical modelling, controller miscalibration and task variation. As exploration is naturally guided by the prior in the early stages of training, BCF accelerates learning, while substantially improving beyond the performance of the control prior, as the policy gains more experience. More importantly, given the risk-aversity of the control prior, BCF ensures safe exploration and deployment, where the control prior naturally dominates the action distribution in states unknown to the policy. We additionally show BCF’s applicability to the zero-shot sim-to-real setting and its ability to deal with out-of-distribution states in the real world. BCF is a promising approach towards combining the complementary strengths of deep RL and traditional robotic control, surpassing what either can achieve independently. The code and supplementary video material are made publicly available at https://krishanrana.github.io/bcf.

查看原文本刊更多论文

贝叶斯控制器融合:在机器人深度强化学习中利用控制先验

我们提出了贝叶斯控制器融合(BCF):一种混合控制策略，结合了传统手工制作控制器和无模型深度强化学习(RL)的优势。BCF在机器人领域蓬勃发展，许多任务都存在可靠但次优的控制先验，但从头开始的强化学习仍然不安全且数据效率低下。通过融合每个系统的不确定性感知分布输出，BCF在它们之间进行仲裁控制，利用它们各自的优势。我们在两个现实世界的机器人任务上研究了BCF，其中包括在广阔和长期环境中的导航，以及涉及可操作性最大化的复杂到达任务。对于这两个领域，存在简单的手工制作控制器，可以以规避风险的方式解决手头的任务，但由于分析建模，控制器错误校准和任务变化的限制，不一定表现出最佳解决方案。由于在训练的早期阶段，探索自然是由先验引导的，因此BCF加速了学习，同时随着策略获得更多的经验，大大提高了控制先验的性能。更重要的是，考虑到控制先验的风险厌恶性，BCF确保了安全的探索和部署，其中控制先验自然支配着策略未知状态下的行动分布。此外，我们还展示了BCF对零射击模拟到真实设置的适用性，以及它在现实世界中处理分布外状态的能力。BCF是一种很有前途的方法，可以将深度强化学习和传统机器人控制的互补优势结合起来，超越任何一种单独实现的能力。代码和补充视频资料可在https://krishanrana.github.io/bcf上公开获取。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

International Journal of Robotics Research 工程技术-机器人学

CiteScore

22.20

自引率

0.00%

发文量

审稿时长

6-12 weeks

期刊介绍： The International Journal of Robotics Research (IJRR) has been a leading peer-reviewed publication in the field for over two decades. It holds the distinction of being the first scholarly journal dedicated to robotics research. IJRR presents cutting-edge and thought-provoking original research papers, articles, and reviews that delve into groundbreaking trends, technical advancements, and theoretical developments in robotics. Renowned scholars and practitioners contribute to its content, offering their expertise and insights. This journal covers a wide range of topics, going beyond narrow technical advancements to encompass various aspects of robotics. The primary aim of IJRR is to publish work that has lasting value for the scientific and technological advancement of the field. Only original, robust, and practical research that can serve as a foundation for further progress is considered for publication. The focus is on producing content that will remain valuable and relevant over time. In summary, IJRR stands as a prestigious publication that drives innovation and knowledge in robotics research.