支持视觉语言模型的机器人装配多层次场景图

IF 9.1 1区 计算机科学 Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS
Shufei Li , Zhijie Yan , Zuoxu Wang , Yiping Gao
{"title":"支持视觉语言模型的机器人装配多层次场景图","authors":"Shufei Li ,&nbsp;Zhijie Yan ,&nbsp;Zuoxu Wang ,&nbsp;Yiping Gao","doi":"10.1016/j.rcim.2025.102978","DOIUrl":null,"url":null,"abstract":"<div><div>Intelligent robotic assembly is becoming a pivotal component of the manufacturing sector, driven by growing demands for flexibility, sustainability, and resilience. Robots in manufacturing environments need perception, decision-making, and manipulation skills to support the flexible production of diverse products. However, traditional robotic assembly systems typically rely on time-consuming training processes specific to fixed settings, lacking generalization and zero-shot learning capabilities. To address these challenges, this paper introduces a Vision Language Model-enabled Multi-hierarchical Scene Graph (VLM-MSGraph) approach for robotic assembly, featuring generalized assembly sequence learning and 3D manipulation in open scenarios. The MSGraph incorporates high-level task planning structured as triplets, organized by multiple VLM agents. At a low level, the MSGraph retains 3D spatial relationships between industrial parts, enabling the robot to perform assembly tasks while accounting for object geometry for effective manipulation. Assembly drawings, physics simulations, and assembly tasks in a laboratory setting are used to evaluate the proposed system, advancing flexible automation in robotics.</div></div>","PeriodicalId":21452,"journal":{"name":"Robotics and Computer-integrated Manufacturing","volume":"94 ","pages":"Article 102978"},"PeriodicalIF":9.1000,"publicationDate":"2025-02-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"VLM-MSGraph: Vision Language Model-enabled Multi-hierarchical Scene Graph for robotic assembly\",\"authors\":\"Shufei Li ,&nbsp;Zhijie Yan ,&nbsp;Zuoxu Wang ,&nbsp;Yiping Gao\",\"doi\":\"10.1016/j.rcim.2025.102978\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Intelligent robotic assembly is becoming a pivotal component of the manufacturing sector, driven by growing demands for flexibility, sustainability, and resilience. Robots in manufacturing environments need perception, decision-making, and manipulation skills to support the flexible production of diverse products. However, traditional robotic assembly systems typically rely on time-consuming training processes specific to fixed settings, lacking generalization and zero-shot learning capabilities. To address these challenges, this paper introduces a Vision Language Model-enabled Multi-hierarchical Scene Graph (VLM-MSGraph) approach for robotic assembly, featuring generalized assembly sequence learning and 3D manipulation in open scenarios. The MSGraph incorporates high-level task planning structured as triplets, organized by multiple VLM agents. At a low level, the MSGraph retains 3D spatial relationships between industrial parts, enabling the robot to perform assembly tasks while accounting for object geometry for effective manipulation. Assembly drawings, physics simulations, and assembly tasks in a laboratory setting are used to evaluate the proposed system, advancing flexible automation in robotics.</div></div>\",\"PeriodicalId\":21452,\"journal\":{\"name\":\"Robotics and Computer-integrated Manufacturing\",\"volume\":\"94 \",\"pages\":\"Article 102978\"},\"PeriodicalIF\":9.1000,\"publicationDate\":\"2025-02-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Robotics and Computer-integrated Manufacturing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0736584525000328\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Robotics and Computer-integrated Manufacturing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0736584525000328","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0

摘要

由于对灵活性、可持续性和弹性的需求不断增长,智能机器人装配正在成为制造业的关键组成部分。制造环境中的机器人需要感知、决策和操作技能,以支持多种产品的灵活生产。然而,传统的机器人装配系统通常依赖于特定于固定设置的耗时训练过程,缺乏泛化和零射击学习能力。为了解决这些挑战,本文介绍了一种基于视觉语言模型的机器人装配多级场景图(VLM-MSGraph)方法,该方法具有开放场景下的广义装配序列学习和3D操作。MSGraph结合了结构为三元组的高级任务规划,由多个VLM代理组织。在较低的层次上,MSGraph保留了工业零件之间的3D空间关系,使机器人能够执行装配任务,同时考虑到物体的几何形状以进行有效的操作。装配图,物理模拟,并在实验室设置装配任务用于评估提出的系统,推进机器人灵活自动化。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
VLM-MSGraph: Vision Language Model-enabled Multi-hierarchical Scene Graph for robotic assembly
Intelligent robotic assembly is becoming a pivotal component of the manufacturing sector, driven by growing demands for flexibility, sustainability, and resilience. Robots in manufacturing environments need perception, decision-making, and manipulation skills to support the flexible production of diverse products. However, traditional robotic assembly systems typically rely on time-consuming training processes specific to fixed settings, lacking generalization and zero-shot learning capabilities. To address these challenges, this paper introduces a Vision Language Model-enabled Multi-hierarchical Scene Graph (VLM-MSGraph) approach for robotic assembly, featuring generalized assembly sequence learning and 3D manipulation in open scenarios. The MSGraph incorporates high-level task planning structured as triplets, organized by multiple VLM agents. At a low level, the MSGraph retains 3D spatial relationships between industrial parts, enabling the robot to perform assembly tasks while accounting for object geometry for effective manipulation. Assembly drawings, physics simulations, and assembly tasks in a laboratory setting are used to evaluate the proposed system, advancing flexible automation in robotics.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Robotics and Computer-integrated Manufacturing
Robotics and Computer-integrated Manufacturing 工程技术-工程:制造
CiteScore
24.10
自引率
13.50%
发文量
160
审稿时长
50 days
期刊介绍: The journal, Robotics and Computer-Integrated Manufacturing, focuses on sharing research applications that contribute to the development of new or enhanced robotics, manufacturing technologies, and innovative manufacturing strategies that are relevant to industry. Papers that combine theory and experimental validation are preferred, while review papers on current robotics and manufacturing issues are also considered. However, papers on traditional machining processes, modeling and simulation, supply chain management, and resource optimization are generally not within the scope of the journal, as there are more appropriate journals for these topics. Similarly, papers that are overly theoretical or mathematical will be directed to other suitable journals. The journal welcomes original papers in areas such as industrial robotics, human-robot collaboration in manufacturing, cloud-based manufacturing, cyber-physical production systems, big data analytics in manufacturing, smart mechatronics, machine learning, adaptive and sustainable manufacturing, and other fields involving unique manufacturing technologies.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信