LLM-driven symbolic planning and hierarchical imitation learning for long-horizon deformable object assembly

IF 11.4 1区计算机科学 Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

Robotics and Computer-integrated Manufacturing Pub Date : 2025-07-28 DOI:10.1016/j.rcim.2025.103096

Jiaming Qi , Liang Lu , Fangyuan Wang , Hoi-Yin Lee , David Navarro-Alarcon , Zeqing Zhang , Peng Zhou

{"title":"LLM-driven symbolic planning and hierarchical imitation learning for long-horizon deformable object assembly","authors":"Jiaming Qi , Liang Lu , Fangyuan Wang , Hoi-Yin Lee , David Navarro-Alarcon , Zeqing Zhang , Peng Zhou","doi":"10.1016/j.rcim.2025.103096","DOIUrl":null,"url":null,"abstract":"<div><div>Long-horizon assembly tasks involving deformable objects pose substantial challenges for autonomous robots, stemming from infinite-dimensional state spaces, complex sequential dependencies, and high variability in real-world conditions. In this work, we propose a novel and robust framework that tightly integrates Large Language Model (LLM)-driven symbolic planning with hierarchical imitation learning to enable reliable and generalizable solutions for deformable object assembly. Our approach leverages the advanced reasoning capabilities of LLMs to translate natural language task instructions into structured symbolic task plans. This decomposition is initiated by a visual-language model (VLM) that generates descriptive subgoals from key visual frames of a human demonstration. Each subgoal is then automatically grounded in the robot’s perception via a VLM query mechanism, ensuring precise and task-relevant state estimation. For execution, a 3D diffusion policy (DP3) conditioned on visual input and symbolic subgoals generates smooth, low-level action trajectories, bridging the gap between high-level symbolic reasoning and dexterous manipulation. We validate our hierarchical framework on a real-world round belt drive assembly benchmark, demonstrating significant improvements in success rates, error recovery, and generalization across diverse and perturbed initial conditions, compared to existing approaches. Our results highlight the potential of integrating LLM-based symbolic abstraction, targeted state querying, and diffusion-based visuomotor control for robust, autonomous assembly of deformable objects in unstructured environments.</div></div>","PeriodicalId":21452,"journal":{"name":"Robotics and Computer-integrated Manufacturing","volume":"97 ","pages":"Article 103096"},"PeriodicalIF":11.4000,"publicationDate":"2025-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Robotics and Computer-integrated Manufacturing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0736584525001504","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}

引用次数: 0

Abstract

Long-horizon assembly tasks involving deformable objects pose substantial challenges for autonomous robots, stemming from infinite-dimensional state spaces, complex sequential dependencies, and high variability in real-world conditions. In this work, we propose a novel and robust framework that tightly integrates Large Language Model (LLM)-driven symbolic planning with hierarchical imitation learning to enable reliable and generalizable solutions for deformable object assembly. Our approach leverages the advanced reasoning capabilities of LLMs to translate natural language task instructions into structured symbolic task plans. This decomposition is initiated by a visual-language model (VLM) that generates descriptive subgoals from key visual frames of a human demonstration. Each subgoal is then automatically grounded in the robot’s perception via a VLM query mechanism, ensuring precise and task-relevant state estimation. For execution, a 3D diffusion policy (DP3) conditioned on visual input and symbolic subgoals generates smooth, low-level action trajectories, bridging the gap between high-level symbolic reasoning and dexterous manipulation. We validate our hierarchical framework on a real-world round belt drive assembly benchmark, demonstrating significant improvements in success rates, error recovery, and generalization across diverse and perturbed initial conditions, compared to existing approaches. Our results highlight the potential of integrating LLM-based symbolic abstraction, targeted state querying, and diffusion-based visuomotor control for robust, autonomous assembly of deformable objects in unstructured environments.

查看原文本刊更多论文

基于llm驱动的符号规划和分层模仿学习的长视界可变形物体装配

由于无限维状态空间、复杂的顺序依赖关系和现实世界条件下的高度可变性，涉及可变形物体的长视界装配任务对自主机器人构成了巨大挑战。在这项工作中，我们提出了一个新颖且强大的框架，该框架将大型语言模型（LLM）驱动的符号规划与分层模仿学习紧密集成，从而为可变形对象组装提供可靠且可推广的解决方案。我们的方法利用llm的高级推理能力，将自然语言任务指令翻译成结构化的符号任务计划。这种分解是由视觉语言模型（VLM）发起的，该模型从人类演示的关键视觉框架生成描述性子目标。然后通过VLM查询机制自动将每个子目标建立在机器人的感知中，确保精确和与任务相关的状态估计。对于执行，3D扩散策略（DP3）以视觉输入和符号子目标为条件，生成平滑的低级动作轨迹，弥合了高级符号推理和灵巧操作之间的差距。我们在一个真实的圆带传动装配基准上验证了我们的分层框架，与现有方法相比，在成功率、错误恢复和不同初始条件下的泛化方面有了显着提高。我们的研究结果强调了集成基于llm的符号抽象、目标状态查询和基于扩散的视觉运动控制的潜力，以实现非结构化环境中可变形物体的鲁棒自主组装。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Robotics and Computer-integrated Manufacturing 工程技术-工程：制造

CiteScore

24.10

自引率

13.50%

发文量

160

审稿时长

50 days

期刊介绍： The journal, Robotics and Computer-Integrated Manufacturing, focuses on sharing research applications that contribute to the development of new or enhanced robotics, manufacturing technologies, and innovative manufacturing strategies that are relevant to industry. Papers that combine theory and experimental validation are preferred, while review papers on current robotics and manufacturing issues are also considered. However, papers on traditional machining processes, modeling and simulation, supply chain management, and resource optimization are generally not within the scope of the journal, as there are more appropriate journals for these topics. Similarly, papers that are overly theoretical or mathematical will be directed to other suitable journals. The journal welcomes original papers in areas such as industrial robotics, human-robot collaboration in manufacturing, cloud-based manufacturing, cyber-physical production systems, big data analytics in manufacturing, smart mechatronics, machine learning, adaptive and sustainable manufacturing, and other fields involving unique manufacturing technologies.