{"title":"Cross-modal attention and geometric contextual aggregation network for 6DoF object pose estimation","authors":"Yi Guo , Fei Wang , Hao Chu , Shiguang Wen","doi":"10.1016/j.neucom.2024.128891","DOIUrl":null,"url":null,"abstract":"<div><div>The availability of affordable RGB-D sensors has made it more suitable to use RGB-D images for accurate 6D pose estimation, which allows for precise 6D parameter prediction using RGB-D images while maintaining a reasonable cost. A crucial research challenge is effectively exploiting adaptive feature extraction and fusion from the appearance information of RGB images and the geometric information of depth images. Moreover, previous methods have neglected the spatial geometric relationships of local position and the properties of point features, which are beneficial for tackling pose estimation in occlusion scenarios. In this work, we propose a cross-attention fusion framework for learning 6D pose estimation from RGB-D images. During the feature extraction stage, we design a geometry-aware context network that encodes local geometric properties of objects in point clouds using dual criteria, distance, and geometric angles. Moreover, we propose a cross-attention framework that combines spatial and channel attention in a cross-modal attention manner. This innovative framework enables us to capture the correlation and importance between RGB and depth features, resulting in improved accuracy in pose estimation, particularly in complex scenes. In the experimental results, we demonstrated that the proposed method outperforms state-of-the-art methods on four challenging benchmark datasets: YCB-Video, LineMOD, Occlusion LineMOD, and MP6D. Video is available at <span><span>https://youtu.be/4mgdbQKaHOc</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"617 ","pages":"Article 128891"},"PeriodicalIF":5.5000,"publicationDate":"2024-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neurocomputing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S092523122401662X","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
The availability of affordable RGB-D sensors has made it more suitable to use RGB-D images for accurate 6D pose estimation, which allows for precise 6D parameter prediction using RGB-D images while maintaining a reasonable cost. A crucial research challenge is effectively exploiting adaptive feature extraction and fusion from the appearance information of RGB images and the geometric information of depth images. Moreover, previous methods have neglected the spatial geometric relationships of local position and the properties of point features, which are beneficial for tackling pose estimation in occlusion scenarios. In this work, we propose a cross-attention fusion framework for learning 6D pose estimation from RGB-D images. During the feature extraction stage, we design a geometry-aware context network that encodes local geometric properties of objects in point clouds using dual criteria, distance, and geometric angles. Moreover, we propose a cross-attention framework that combines spatial and channel attention in a cross-modal attention manner. This innovative framework enables us to capture the correlation and importance between RGB and depth features, resulting in improved accuracy in pose estimation, particularly in complex scenes. In the experimental results, we demonstrated that the proposed method outperforms state-of-the-art methods on four challenging benchmark datasets: YCB-Video, LineMOD, Occlusion LineMOD, and MP6D. Video is available at https://youtu.be/4mgdbQKaHOc.
期刊介绍:
Neurocomputing publishes articles describing recent fundamental contributions in the field of neurocomputing. Neurocomputing theory, practice and applications are the essential topics being covered.