A Backbone for Long-Horizon Robot Task Understanding

IF 4.6 2区计算机科学 Q2 ROBOTICS

IEEE Robotics and Automation Letters Pub Date : 2025-01-06 DOI:10.1109/LRA.2025.3526441

Xiaoshuai Chen;Wei Chen;Dongmyoung Lee;Yukun Ge;Nicolas Rojas;Petar Kormushev

{"title":"A Backbone for Long-Horizon Robot Task Understanding","authors":"Xiaoshuai Chen;Wei Chen;Dongmyoung Lee;Yukun Ge;Nicolas Rojas;Petar Kormushev","doi":"10.1109/LRA.2025.3526441","DOIUrl":null,"url":null,"abstract":"End-to-end robotlearning, particularly for long-horizon tasks, often results in unpredictable outcomes and poor generalization. To address these challenges, we propose a novel <italic>Therblig-Based Backbone Framework (TBBF) as a fundamental structure to enhance interpretability, data efficiency, and generalization in robotic systems. TBBF utilizes expert demonstrations to enable therblig-level task decomposition, facilitate efficient action-object mapping, and generate adaptive trajectories for new scenarios. The approach consists of two stages: offline training and online testing. During the offline training stage, we developed the <italic>Meta-RGate SynerFusion (MGSF) network for accurate therblig segmentation across various tasks. In the online testing stage, after a one-shot demonstration of a new task is collected, our <italic>MGSF network extracts high-level knowledge, which is then encoded into the image using <italic>Action Registration (ActionREG). Additionally, <italic>Large Language Model (LLM)-Alignment Policy for Visual Correction (LAP-VC) is employed to ensure precise action registration, facilitating trajectory transfer in novel robot scenarios. Experimental results validate these methods, achieving 94.37% recall in therblig segmentation and success rates of 94.4% and 80% in real-world online robot testing for simple and complex scenarios, respectively.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"10 2","pages":"2048-2055"},"PeriodicalIF":4.6000,"publicationDate":"2025-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Robotics and Automation Letters","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10829642/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ROBOTICS","Score":null,"Total":0}

引用次数: 0

Abstract

End-to-end robotlearning, particularly for long-horizon tasks, often results in unpredictable outcomes and poor generalization. To address these challenges, we propose a novel Therblig-Based Backbone Framework (TBBF) as a fundamental structure to enhance interpretability, data efficiency, and generalization in robotic systems. TBBF utilizes expert demonstrations to enable therblig-level task decomposition, facilitate efficient action-object mapping, and generate adaptive trajectories for new scenarios. The approach consists of two stages: offline training and online testing. During the offline training stage, we developed the Meta-RGate SynerFusion (MGSF) network for accurate therblig segmentation across various tasks. In the online testing stage, after a one-shot demonstration of a new task is collected, our MGSF network extracts high-level knowledge, which is then encoded into the image using Action Registration (ActionREG). Additionally, Large Language Model (LLM)-Alignment Policy for Visual Correction (LAP-VC) is employed to ensure precise action registration, facilitating trajectory transfer in novel robot scenarios. Experimental results validate these methods, achieving 94.37% recall in therblig segmentation and success rates of 94.4% and 80% in real-world online robot testing for simple and complex scenarios, respectively.

查看原文本刊更多论文

长视界机器人任务理解的主干

端到端机器人学习，特别是对于长期任务，往往会导致不可预测的结果和较差的泛化。为了应对这些挑战，我们提出了一种新的基于therblig的骨干框架（TBBF）作为基本结构，以提高机器人系统的可解释性、数据效率和泛化。TBBF利用专家演示来实现层级任务分解，促进有效的动作-对象映射，并为新场景生成自适应轨迹。该方法包括两个阶段：离线培训和在线测试。在离线训练阶段，我们开发了Meta-RGate synnerfusion （MGSF）网络，用于跨各种任务的精确分割。在在线测试阶段，在收集新任务的一次性演示后，我们的MGSF网络提取高级知识，然后使用动作注册（ActionREG）将其编码到图像中。此外，采用大语言模型(LLM)-视觉校正对齐策略（LAP-VC）确保精确的动作配准，促进机器人在新场景下的轨迹转移。实验结果验证了这些方法的有效性，在简单场景和复杂场景下，图像分割的召回率分别为94.37%，在现实世界中，机器人在线测试的成功率分别为94.4%和80%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Robotics and Automation Letters Computer Science-Computer Science Applications

CiteScore

9.60

自引率

15.40%

发文量

1428

期刊介绍： The scope of this journal is to publish peer-reviewed articles that provide a timely and concise account of innovative research ideas and application results, reporting significant theoretical findings and application case studies in areas of robotics and automation.