A Backbone for Long-Horizon Robot Task Understanding

IF 4.6 2区 计算机科学 Q2 ROBOTICS
Xiaoshuai Chen;Wei Chen;Dongmyoung Lee;Yukun Ge;Nicolas Rojas;Petar Kormushev
{"title":"A Backbone for Long-Horizon Robot Task Understanding","authors":"Xiaoshuai Chen;Wei Chen;Dongmyoung Lee;Yukun Ge;Nicolas Rojas;Petar Kormushev","doi":"10.1109/LRA.2025.3526441","DOIUrl":null,"url":null,"abstract":"End-to-end robotlearning, particularly for long-horizon tasks, often results in unpredictable outcomes and poor generalization. To address these challenges, we propose a novel <italic>Therblig-Based Backbone Framework (TBBF)</i> as a fundamental structure to enhance interpretability, data efficiency, and generalization in robotic systems. TBBF utilizes expert demonstrations to enable therblig-level task decomposition, facilitate efficient action-object mapping, and generate adaptive trajectories for new scenarios. The approach consists of two stages: offline training and online testing. During the offline training stage, we developed the <italic>Meta-RGate SynerFusion (MGSF)</i> network for accurate therblig segmentation across various tasks. In the online testing stage, after a one-shot demonstration of a new task is collected, our <italic>MGSF</i> network extracts high-level knowledge, which is then encoded into the image using <italic>Action Registration (ActionREG)</i>. Additionally, <italic>Large Language Model (LLM)-Alignment Policy for Visual Correction (LAP-VC)</i> is employed to ensure precise action registration, facilitating trajectory transfer in novel robot scenarios. Experimental results validate these methods, achieving 94.37% recall in therblig segmentation and success rates of 94.4% and 80% in real-world online robot testing for simple and complex scenarios, respectively.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"10 2","pages":"2048-2055"},"PeriodicalIF":4.6000,"publicationDate":"2025-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Robotics and Automation Letters","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10829642/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ROBOTICS","Score":null,"Total":0}
引用次数: 0

Abstract

End-to-end robotlearning, particularly for long-horizon tasks, often results in unpredictable outcomes and poor generalization. To address these challenges, we propose a novel Therblig-Based Backbone Framework (TBBF) as a fundamental structure to enhance interpretability, data efficiency, and generalization in robotic systems. TBBF utilizes expert demonstrations to enable therblig-level task decomposition, facilitate efficient action-object mapping, and generate adaptive trajectories for new scenarios. The approach consists of two stages: offline training and online testing. During the offline training stage, we developed the Meta-RGate SynerFusion (MGSF) network for accurate therblig segmentation across various tasks. In the online testing stage, after a one-shot demonstration of a new task is collected, our MGSF network extracts high-level knowledge, which is then encoded into the image using Action Registration (ActionREG). Additionally, Large Language Model (LLM)-Alignment Policy for Visual Correction (LAP-VC) is employed to ensure precise action registration, facilitating trajectory transfer in novel robot scenarios. Experimental results validate these methods, achieving 94.37% recall in therblig segmentation and success rates of 94.4% and 80% in real-world online robot testing for simple and complex scenarios, respectively.
长视界机器人任务理解的主干
端到端机器人学习,特别是对于长期任务,往往会导致不可预测的结果和较差的泛化。为了应对这些挑战,我们提出了一种新的基于therblig的骨干框架(TBBF)作为基本结构,以提高机器人系统的可解释性、数据效率和泛化。TBBF利用专家演示来实现层级任务分解,促进有效的动作-对象映射,并为新场景生成自适应轨迹。该方法包括两个阶段:离线培训和在线测试。在离线训练阶段,我们开发了Meta-RGate synnerfusion (MGSF)网络,用于跨各种任务的精确分割。在在线测试阶段,在收集新任务的一次性演示后,我们的MGSF网络提取高级知识,然后使用动作注册(ActionREG)将其编码到图像中。此外,采用大语言模型(LLM)-视觉校正对齐策略(LAP-VC)确保精确的动作配准,促进机器人在新场景下的轨迹转移。实验结果验证了这些方法的有效性,在简单场景和复杂场景下,图像分割的召回率分别为94.37%,在现实世界中,机器人在线测试的成功率分别为94.4%和80%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
IEEE Robotics and Automation Letters
IEEE Robotics and Automation Letters Computer Science-Computer Science Applications
CiteScore
9.60
自引率
15.40%
发文量
1428
期刊介绍: The scope of this journal is to publish peer-reviewed articles that provide a timely and concise account of innovative research ideas and application results, reporting significant theoretical findings and application case studies in areas of robotics and automation.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信