Haotian Lei, Xiangyu Liu, Yan Zhou, Guo Niu, Changan Yi, Yuexia Zhou, Xiaofeng Liang, Fuhe Liu
{"title":"MMFEIR:多关注互特征增强与实例重构的类别级6D目标姿态估计","authors":"Haotian Lei, Xiangyu Liu, Yan Zhou, Guo Niu, Changan Yi, Yuexia Zhou, Xiaofeng Liang, Fuhe Liu","doi":"10.1016/j.imavis.2025.105657","DOIUrl":null,"url":null,"abstract":"<div><div>Category-level 6D object pose estimation is a fundamental problem in fields such as robotic manipulation and augmented reality. The goal of this task is to predict the rotation, translation, and size of the object. Current research typically extracts the deformation field from observed point cloud of the object for estimating 6D pose. However, they did not fully consider the interaction between the observed point cloud, prior shape, and image of the object, resulting in the loss of geometric and texture features of the object, thereby affecting the accuracy of pose estimation for objects with large intra class configuration differences. In this paper, we propose a Multi-attention Mutual Feature Enhance Module (MMFEM) to enhance the inherent linkages among different perception data of objects. MMFEM enhances the interaction between images, observed point cloud, and prior shape through multiple attention modules. This enables the network to gain a deeper understanding of the differences between distinct instances. In addition, to improve the feature expression of geometric details for objects, we propose the Instance Reconstruction Deformation Module (IRDM). IRDM reconstructed the three-dimensional instance point cloud for each object, enhancing the model’s ability to identify differences in geometric configurations of objects. Extensive experiments on the CAMERA25 and REAL275 datasets show that the proposed methods have achieved 79.0% and 91.2% on the 3D75 metric, 52.6% and 75.9% on the 5°2 cm metric, respectively, outperforming current mainstream methods.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"162 ","pages":"Article 105657"},"PeriodicalIF":4.2000,"publicationDate":"2025-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"MMFEIR: Multi-attention Mutual Feature Enhance and Instance Reconstruction for category-level 6D object pose estimation\",\"authors\":\"Haotian Lei, Xiangyu Liu, Yan Zhou, Guo Niu, Changan Yi, Yuexia Zhou, Xiaofeng Liang, Fuhe Liu\",\"doi\":\"10.1016/j.imavis.2025.105657\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Category-level 6D object pose estimation is a fundamental problem in fields such as robotic manipulation and augmented reality. The goal of this task is to predict the rotation, translation, and size of the object. Current research typically extracts the deformation field from observed point cloud of the object for estimating 6D pose. However, they did not fully consider the interaction between the observed point cloud, prior shape, and image of the object, resulting in the loss of geometric and texture features of the object, thereby affecting the accuracy of pose estimation for objects with large intra class configuration differences. In this paper, we propose a Multi-attention Mutual Feature Enhance Module (MMFEM) to enhance the inherent linkages among different perception data of objects. MMFEM enhances the interaction between images, observed point cloud, and prior shape through multiple attention modules. This enables the network to gain a deeper understanding of the differences between distinct instances. In addition, to improve the feature expression of geometric details for objects, we propose the Instance Reconstruction Deformation Module (IRDM). IRDM reconstructed the three-dimensional instance point cloud for each object, enhancing the model’s ability to identify differences in geometric configurations of objects. Extensive experiments on the CAMERA25 and REAL275 datasets show that the proposed methods have achieved 79.0% and 91.2% on the 3D75 metric, 52.6% and 75.9% on the 5°2 cm metric, respectively, outperforming current mainstream methods.</div></div>\",\"PeriodicalId\":50374,\"journal\":{\"name\":\"Image and Vision Computing\",\"volume\":\"162 \",\"pages\":\"Article 105657\"},\"PeriodicalIF\":4.2000,\"publicationDate\":\"2025-07-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Image and Vision Computing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0262885625002458\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Image and Vision Computing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0262885625002458","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
MMFEIR: Multi-attention Mutual Feature Enhance and Instance Reconstruction for category-level 6D object pose estimation
Category-level 6D object pose estimation is a fundamental problem in fields such as robotic manipulation and augmented reality. The goal of this task is to predict the rotation, translation, and size of the object. Current research typically extracts the deformation field from observed point cloud of the object for estimating 6D pose. However, they did not fully consider the interaction between the observed point cloud, prior shape, and image of the object, resulting in the loss of geometric and texture features of the object, thereby affecting the accuracy of pose estimation for objects with large intra class configuration differences. In this paper, we propose a Multi-attention Mutual Feature Enhance Module (MMFEM) to enhance the inherent linkages among different perception data of objects. MMFEM enhances the interaction between images, observed point cloud, and prior shape through multiple attention modules. This enables the network to gain a deeper understanding of the differences between distinct instances. In addition, to improve the feature expression of geometric details for objects, we propose the Instance Reconstruction Deformation Module (IRDM). IRDM reconstructed the three-dimensional instance point cloud for each object, enhancing the model’s ability to identify differences in geometric configurations of objects. Extensive experiments on the CAMERA25 and REAL275 datasets show that the proposed methods have achieved 79.0% and 91.2% on the 3D75 metric, 52.6% and 75.9% on the 5°2 cm metric, respectively, outperforming current mainstream methods.
期刊介绍:
Image and Vision Computing has as a primary aim the provision of an effective medium of interchange for the results of high quality theoretical and applied research fundamental to all aspects of image interpretation and computer vision. The journal publishes work that proposes new image interpretation and computer vision methodology or addresses the application of such methods to real world scenes. It seeks to strengthen a deeper understanding in the discipline by encouraging the quantitative comparison and performance evaluation of the proposed methodology. The coverage includes: image interpretation, scene modelling, object recognition and tracking, shape analysis, monitoring and surveillance, active vision and robotic systems, SLAM, biologically-inspired computer vision, motion analysis, stereo vision, document image understanding, character and handwritten text recognition, face and gesture recognition, biometrics, vision-based human-computer interaction, human activity and behavior understanding, data fusion from multiple sensor inputs, image databases.