Jiaming Qi , Liang Lu , Fangyuan Wang , Hoi-Yin Lee , David Navarro-Alarcon , Zeqing Zhang , Peng Zhou
{"title":"LLM-driven symbolic planning and hierarchical imitation learning for long-horizon deformable object assembly","authors":"Jiaming Qi , Liang Lu , Fangyuan Wang , Hoi-Yin Lee , David Navarro-Alarcon , Zeqing Zhang , Peng Zhou","doi":"10.1016/j.rcim.2025.103096","DOIUrl":null,"url":null,"abstract":"<div><div>Long-horizon assembly tasks involving deformable objects pose substantial challenges for autonomous robots, stemming from infinite-dimensional state spaces, complex sequential dependencies, and high variability in real-world conditions. In this work, we propose a novel and robust framework that tightly integrates Large Language Model (LLM)-driven symbolic planning with hierarchical imitation learning to enable reliable and generalizable solutions for deformable object assembly. Our approach leverages the advanced reasoning capabilities of LLMs to translate natural language task instructions into structured symbolic task plans. This decomposition is initiated by a visual-language model (VLM) that generates descriptive subgoals from key visual frames of a human demonstration. Each subgoal is then automatically grounded in the robot’s perception via a VLM query mechanism, ensuring precise and task-relevant state estimation. For execution, a 3D diffusion policy (DP3) conditioned on visual input and symbolic subgoals generates smooth, low-level action trajectories, bridging the gap between high-level symbolic reasoning and dexterous manipulation. We validate our hierarchical framework on a real-world round belt drive assembly benchmark, demonstrating significant improvements in success rates, error recovery, and generalization across diverse and perturbed initial conditions, compared to existing approaches. Our results highlight the potential of integrating LLM-based symbolic abstraction, targeted state querying, and diffusion-based visuomotor control for robust, autonomous assembly of deformable objects in unstructured environments.</div></div>","PeriodicalId":21452,"journal":{"name":"Robotics and Computer-integrated Manufacturing","volume":"97 ","pages":"Article 103096"},"PeriodicalIF":11.4000,"publicationDate":"2025-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Robotics and Computer-integrated Manufacturing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0736584525001504","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0
Abstract
Long-horizon assembly tasks involving deformable objects pose substantial challenges for autonomous robots, stemming from infinite-dimensional state spaces, complex sequential dependencies, and high variability in real-world conditions. In this work, we propose a novel and robust framework that tightly integrates Large Language Model (LLM)-driven symbolic planning with hierarchical imitation learning to enable reliable and generalizable solutions for deformable object assembly. Our approach leverages the advanced reasoning capabilities of LLMs to translate natural language task instructions into structured symbolic task plans. This decomposition is initiated by a visual-language model (VLM) that generates descriptive subgoals from key visual frames of a human demonstration. Each subgoal is then automatically grounded in the robot’s perception via a VLM query mechanism, ensuring precise and task-relevant state estimation. For execution, a 3D diffusion policy (DP3) conditioned on visual input and symbolic subgoals generates smooth, low-level action trajectories, bridging the gap between high-level symbolic reasoning and dexterous manipulation. We validate our hierarchical framework on a real-world round belt drive assembly benchmark, demonstrating significant improvements in success rates, error recovery, and generalization across diverse and perturbed initial conditions, compared to existing approaches. Our results highlight the potential of integrating LLM-based symbolic abstraction, targeted state querying, and diffusion-based visuomotor control for robust, autonomous assembly of deformable objects in unstructured environments.
期刊介绍:
The journal, Robotics and Computer-Integrated Manufacturing, focuses on sharing research applications that contribute to the development of new or enhanced robotics, manufacturing technologies, and innovative manufacturing strategies that are relevant to industry. Papers that combine theory and experimental validation are preferred, while review papers on current robotics and manufacturing issues are also considered. However, papers on traditional machining processes, modeling and simulation, supply chain management, and resource optimization are generally not within the scope of the journal, as there are more appropriate journals for these topics. Similarly, papers that are overly theoretical or mathematical will be directed to other suitable journals. The journal welcomes original papers in areas such as industrial robotics, human-robot collaboration in manufacturing, cloud-based manufacturing, cyber-physical production systems, big data analytics in manufacturing, smart mechatronics, machine learning, adaptive and sustainable manufacturing, and other fields involving unique manufacturing technologies.