Online Damage Recovery for Physical Robots with Hierarchical Quality-Diversity

ACM Transactions on Evolutionary Learning Pub Date : 2022-10-18 DOI:10.1145/3596912

Maxime Allard, Simón C. Smith, Konstantinos Chatzilygeroudis, Bryan Lim, Antoine Cully

{"title":"Online Damage Recovery for Physical Robots with Hierarchical Quality-Diversity","authors":"Maxime Allard, Simón C. Smith, Konstantinos Chatzilygeroudis, Bryan Lim, Antoine Cully","doi":"10.1145/3596912","DOIUrl":null,"url":null,"abstract":"In real-world environments, robots need to be resilient to damages and robust to unforeseen scenarios. Quality-Diversity (QD) algorithms have been successfully used to make robots adapt to damages in seconds by leveraging a diverse set of learned skills. A high diversity of skills increases the chances of a robot to succeed at overcoming new situations since there are more potential alternatives to solve a new task. However, finding and storing a large behavioural diversity of multiple skills often leads to an increase in computational complexity. Furthermore, robot planning in a large skill space is an additional challenge that arises with an increased number of skills. Hierarchical structures can help to reduce this search and storage complexity by breaking down skills into primitive skills. In this article, we extend the analysis of the Hierarchical Trial and Error algorithm, which uses a hierarchical behavioural repertoire to learn diverse skills and leverages them to make the robot adapt quickly in the physical world. We show that the hierarchical decomposition of skills enables the robot to learn more complex behaviours while keeping the learning of the repertoire tractable. Experiments with a hexapod robot both in simulation and the physical world show that our method solves a maze navigation task with up to, respectively, 20% and 43% less actions than the best baselines while having 78% less complete failures.","PeriodicalId":220659,"journal":{"name":"ACM Transactions on Evolutionary Learning","volume":"429 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Evolutionary Learning","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3596912","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

Abstract

In real-world environments, robots need to be resilient to damages and robust to unforeseen scenarios. Quality-Diversity (QD) algorithms have been successfully used to make robots adapt to damages in seconds by leveraging a diverse set of learned skills. A high diversity of skills increases the chances of a robot to succeed at overcoming new situations since there are more potential alternatives to solve a new task. However, finding and storing a large behavioural diversity of multiple skills often leads to an increase in computational complexity. Furthermore, robot planning in a large skill space is an additional challenge that arises with an increased number of skills. Hierarchical structures can help to reduce this search and storage complexity by breaking down skills into primitive skills. In this article, we extend the analysis of the Hierarchical Trial and Error algorithm, which uses a hierarchical behavioural repertoire to learn diverse skills and leverages them to make the robot adapt quickly in the physical world. We show that the hierarchical decomposition of skills enables the robot to learn more complex behaviours while keeping the learning of the repertoire tractable. Experiments with a hexapod robot both in simulation and the physical world show that our method solves a maze navigation task with up to, respectively, 20% and 43% less actions than the best baselines while having 78% less complete failures.

查看原文本刊更多论文

基于分层质量多样性的物理机器人在线损伤恢复

在现实环境中，机器人需要对损坏有弹性，对不可预见的情况有很强的抵抗力。质量多样性(QD)算法已被成功地用于利用多种学习技能，使机器人在几秒钟内适应损害。技能的高度多样性增加了机器人成功克服新情况的机会，因为有更多潜在的替代方案来解决新任务。然而，发现和存储多种技能的大量行为多样性往往会导致计算复杂性的增加。此外，随着技能数量的增加，机器人在大技能空间中的规划是一个额外的挑战。通过将技能分解为原始技能，层次结构可以帮助降低搜索和存储的复杂性。在本文中，我们扩展了分层试错算法的分析，该算法使用分层行为库来学习各种技能，并利用它们使机器人在物理世界中快速适应。我们表明，技能的分层分解使机器人能够学习更复杂的行为，同时保持对曲目的学习易于处理。在六足机器人的仿真和物理世界中进行的实验表明，我们的方法在解决迷宫导航任务时，比最佳基线分别减少了20%和43%的动作，同时减少了78%的完全失败。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

ACM Transactions on Evolutionary Learning

自引率

0.00%

发文量