{"title":"大型视觉语言模型为人机协作提供了新的物体6D姿态估计","authors":"Wanqing Xia, Hao Zheng, Weiliang Xu, Xun Xu","doi":"10.1016/j.rcim.2025.103030","DOIUrl":null,"url":null,"abstract":"<div><div>Six-Degree-of-Freedom (6D) pose estimation is essential for robotic manipulation tasks, especially in human-robot collaboration environments. Recently, 6D pose estimation has been extended from seen objects to novel objects due to the frequent encounters with unfamiliar items in real-life scenarios. This paper presents a three-stage pipeline for 6D pose estimation of previously unseen objects, leveraging the capabilities of large vision-language models. Our approach consists of vision-language model-based object detection and segmentation, mask selection with pose hypothesis generated from CAD models, and refinement and scoring of pose candidates. We evaluate our method on the YCB-Video dataset, achieving a state-of-the-art Average Recall (AR) score of 75.8 with RGB-D images, demonstrating its effectiveness in accurately estimating 6D poses for a diverse range of objects. The effectiveness of each operation stage is investigated in the ablation study. To validate the practical applicability of our approach, we conduct case studies on a real-world robotic platform, focusing on object pick-up tasks by integrating our 6D pose estimation pipeline with human intention prediction and task analysis algorithms. Results show that the proposed method can effectively handle novel objects in our test environments, as demonstrated through the YCB dataset evaluation and case studies. Our work contributes to the field of human-robot collaboration by introducing a flexible, generalizable approach to 6D pose estimation, enabling robots to adapt to new objects without requiring extensive retraining—a vital capability for advancing human-robot collaboration in dynamic environments. More information can be found in the project GitHub page: <span><span>https://github.com/WanqingXia/HRC_DetAnyPose</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":21452,"journal":{"name":"Robotics and Computer-integrated Manufacturing","volume":"95 ","pages":"Article 103030"},"PeriodicalIF":9.1000,"publicationDate":"2025-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Large vision-language models enabled novel objects 6D pose estimation for human-robot collaboration\",\"authors\":\"Wanqing Xia, Hao Zheng, Weiliang Xu, Xun Xu\",\"doi\":\"10.1016/j.rcim.2025.103030\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Six-Degree-of-Freedom (6D) pose estimation is essential for robotic manipulation tasks, especially in human-robot collaboration environments. Recently, 6D pose estimation has been extended from seen objects to novel objects due to the frequent encounters with unfamiliar items in real-life scenarios. This paper presents a three-stage pipeline for 6D pose estimation of previously unseen objects, leveraging the capabilities of large vision-language models. Our approach consists of vision-language model-based object detection and segmentation, mask selection with pose hypothesis generated from CAD models, and refinement and scoring of pose candidates. We evaluate our method on the YCB-Video dataset, achieving a state-of-the-art Average Recall (AR) score of 75.8 with RGB-D images, demonstrating its effectiveness in accurately estimating 6D poses for a diverse range of objects. The effectiveness of each operation stage is investigated in the ablation study. To validate the practical applicability of our approach, we conduct case studies on a real-world robotic platform, focusing on object pick-up tasks by integrating our 6D pose estimation pipeline with human intention prediction and task analysis algorithms. Results show that the proposed method can effectively handle novel objects in our test environments, as demonstrated through the YCB dataset evaluation and case studies. Our work contributes to the field of human-robot collaboration by introducing a flexible, generalizable approach to 6D pose estimation, enabling robots to adapt to new objects without requiring extensive retraining—a vital capability for advancing human-robot collaboration in dynamic environments. More information can be found in the project GitHub page: <span><span>https://github.com/WanqingXia/HRC_DetAnyPose</span><svg><path></path></svg></span>.</div></div>\",\"PeriodicalId\":21452,\"journal\":{\"name\":\"Robotics and Computer-integrated Manufacturing\",\"volume\":\"95 \",\"pages\":\"Article 103030\"},\"PeriodicalIF\":9.1000,\"publicationDate\":\"2025-04-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Robotics and Computer-integrated Manufacturing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0736584525000845\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Robotics and Computer-integrated Manufacturing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0736584525000845","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
Large vision-language models enabled novel objects 6D pose estimation for human-robot collaboration
Six-Degree-of-Freedom (6D) pose estimation is essential for robotic manipulation tasks, especially in human-robot collaboration environments. Recently, 6D pose estimation has been extended from seen objects to novel objects due to the frequent encounters with unfamiliar items in real-life scenarios. This paper presents a three-stage pipeline for 6D pose estimation of previously unseen objects, leveraging the capabilities of large vision-language models. Our approach consists of vision-language model-based object detection and segmentation, mask selection with pose hypothesis generated from CAD models, and refinement and scoring of pose candidates. We evaluate our method on the YCB-Video dataset, achieving a state-of-the-art Average Recall (AR) score of 75.8 with RGB-D images, demonstrating its effectiveness in accurately estimating 6D poses for a diverse range of objects. The effectiveness of each operation stage is investigated in the ablation study. To validate the practical applicability of our approach, we conduct case studies on a real-world robotic platform, focusing on object pick-up tasks by integrating our 6D pose estimation pipeline with human intention prediction and task analysis algorithms. Results show that the proposed method can effectively handle novel objects in our test environments, as demonstrated through the YCB dataset evaluation and case studies. Our work contributes to the field of human-robot collaboration by introducing a flexible, generalizable approach to 6D pose estimation, enabling robots to adapt to new objects without requiring extensive retraining—a vital capability for advancing human-robot collaboration in dynamic environments. More information can be found in the project GitHub page: https://github.com/WanqingXia/HRC_DetAnyPose.
期刊介绍:
The journal, Robotics and Computer-Integrated Manufacturing, focuses on sharing research applications that contribute to the development of new or enhanced robotics, manufacturing technologies, and innovative manufacturing strategies that are relevant to industry. Papers that combine theory and experimental validation are preferred, while review papers on current robotics and manufacturing issues are also considered. However, papers on traditional machining processes, modeling and simulation, supply chain management, and resource optimization are generally not within the scope of the journal, as there are more appropriate journals for these topics. Similarly, papers that are overly theoretical or mathematical will be directed to other suitable journals. The journal welcomes original papers in areas such as industrial robotics, human-robot collaboration in manufacturing, cloud-based manufacturing, cyber-physical production systems, big data analytics in manufacturing, smart mechatronics, machine learning, adaptive and sustainable manufacturing, and other fields involving unique manufacturing technologies.