Haitao Xiao , Linkun Ma , Qinyao Li , Shuo Ma , Hongxuan Guo , Wenjie Wang , Harutoshi Ogai
{"title":"一种基于像素级特征重要度的自适应加权融合网络用于两阶段6D姿态估计","authors":"Haitao Xiao , Linkun Ma , Qinyao Li , Shuo Ma , Hongxuan Guo , Wenjie Wang , Harutoshi Ogai","doi":"10.1016/j.neucom.2025.130371","DOIUrl":null,"url":null,"abstract":"<div><div>In intelligent industry, accurate recognition and localization of objects in an image is the basis for robots to perform autonomous and intelligent operations. With the rapid development and application of deep learning data fusion technology in pose estimation, the existing 6D pose estimation methods have made many achievements. However, most of the existing methods are not accurate enough to cope with scenes with cluttered backgrounds, inconspicuous textures, and occluded objects. In addition, the existing methods ignore the effect of the accuracy of instance segmentation on the accuracy of pose estimation. To address above issues, this paper proposes a two-stage 6D pose estimation method based on adaptive pixel-importance weighted fusion network with lightweight instance segmentation, named TAPWFusion. In the instance segmentation stage, a lightweight instance segmentation network based on multiscale attention and boundary constraints, named CVi-BC-YOLO, is proposed to improve segmentation accuracy and efficiency. In the pose estimation stage, to eliminate the interference of lighting and occlusion, and enhance the accuracy of the pose estimation, we propose an adaptive pixel-importance weighted fusion network, named APWFusion, which adaptively evaluates the importance of RGB color and the geometrical information of the point cloud. Experiments on LineMOD, YCB-Video and T-LESS datasets prove the advanced and effective nature of our proposed method.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"642 ","pages":"Article 130371"},"PeriodicalIF":5.5000,"publicationDate":"2025-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A novel adaptive weighted fusion network based on pixel level feature importance for two-stage 6D pose estimation\",\"authors\":\"Haitao Xiao , Linkun Ma , Qinyao Li , Shuo Ma , Hongxuan Guo , Wenjie Wang , Harutoshi Ogai\",\"doi\":\"10.1016/j.neucom.2025.130371\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>In intelligent industry, accurate recognition and localization of objects in an image is the basis for robots to perform autonomous and intelligent operations. With the rapid development and application of deep learning data fusion technology in pose estimation, the existing 6D pose estimation methods have made many achievements. However, most of the existing methods are not accurate enough to cope with scenes with cluttered backgrounds, inconspicuous textures, and occluded objects. In addition, the existing methods ignore the effect of the accuracy of instance segmentation on the accuracy of pose estimation. To address above issues, this paper proposes a two-stage 6D pose estimation method based on adaptive pixel-importance weighted fusion network with lightweight instance segmentation, named TAPWFusion. In the instance segmentation stage, a lightweight instance segmentation network based on multiscale attention and boundary constraints, named CVi-BC-YOLO, is proposed to improve segmentation accuracy and efficiency. In the pose estimation stage, to eliminate the interference of lighting and occlusion, and enhance the accuracy of the pose estimation, we propose an adaptive pixel-importance weighted fusion network, named APWFusion, which adaptively evaluates the importance of RGB color and the geometrical information of the point cloud. Experiments on LineMOD, YCB-Video and T-LESS datasets prove the advanced and effective nature of our proposed method.</div></div>\",\"PeriodicalId\":19268,\"journal\":{\"name\":\"Neurocomputing\",\"volume\":\"642 \",\"pages\":\"Article 130371\"},\"PeriodicalIF\":5.5000,\"publicationDate\":\"2025-05-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Neurocomputing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0925231225010434\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neurocomputing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0925231225010434","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
A novel adaptive weighted fusion network based on pixel level feature importance for two-stage 6D pose estimation
In intelligent industry, accurate recognition and localization of objects in an image is the basis for robots to perform autonomous and intelligent operations. With the rapid development and application of deep learning data fusion technology in pose estimation, the existing 6D pose estimation methods have made many achievements. However, most of the existing methods are not accurate enough to cope with scenes with cluttered backgrounds, inconspicuous textures, and occluded objects. In addition, the existing methods ignore the effect of the accuracy of instance segmentation on the accuracy of pose estimation. To address above issues, this paper proposes a two-stage 6D pose estimation method based on adaptive pixel-importance weighted fusion network with lightweight instance segmentation, named TAPWFusion. In the instance segmentation stage, a lightweight instance segmentation network based on multiscale attention and boundary constraints, named CVi-BC-YOLO, is proposed to improve segmentation accuracy and efficiency. In the pose estimation stage, to eliminate the interference of lighting and occlusion, and enhance the accuracy of the pose estimation, we propose an adaptive pixel-importance weighted fusion network, named APWFusion, which adaptively evaluates the importance of RGB color and the geometrical information of the point cloud. Experiments on LineMOD, YCB-Video and T-LESS datasets prove the advanced and effective nature of our proposed method.
期刊介绍:
Neurocomputing publishes articles describing recent fundamental contributions in the field of neurocomputing. Neurocomputing theory, practice and applications are the essential topics being covered.