{"title":"MGPose:使用匹配引导的双通道注意的宽基线相对相机姿态估计","authors":"Wangping Wu;Chuhua Huang;Yongxing Shen;Xin Huang","doi":"10.1109/LRA.2025.3621968","DOIUrl":null,"url":null,"abstract":"Relative camera pose estimation is a fundamental task in computer vision and robotics. In wide-baseline scenarios with limited visual overlap, traditional methods often perform poorly. Existing deep learning approaches are also hindered by irrelevant features and insufficient modeling of the relative motion between image pairs, making accurate pose estimation particularly challenging. In this letter, we propose MGPose, a camera relative pose estimation method using a matching-guided dual-channel attention mechanism. For wide-baseline image pairs, MGPose effectively reduces interference from uncorrelated features through a feature matching strategy, utilizes camera motion prior knowledge to capture the relative motion characteristics of matched points, and employs a bidirectional channel cross-attention mechanism along with a channel self-attention mechanism to fully capture the interactions between different channels of matched points, enabling efficient feature fusion for the image pairs. Extensive experiments on Matterport3D and ScanNet show that MGPose outperforms or matches state-of-the-art methods in camera relative pose estimation.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"10 12","pages":"12293-12300"},"PeriodicalIF":5.3000,"publicationDate":"2025-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"MGPose: Wide-Baseline Relative Camera Pose Estimation Using Matching-Guided Dual Channel-Attention\",\"authors\":\"Wangping Wu;Chuhua Huang;Yongxing Shen;Xin Huang\",\"doi\":\"10.1109/LRA.2025.3621968\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Relative camera pose estimation is a fundamental task in computer vision and robotics. In wide-baseline scenarios with limited visual overlap, traditional methods often perform poorly. Existing deep learning approaches are also hindered by irrelevant features and insufficient modeling of the relative motion between image pairs, making accurate pose estimation particularly challenging. In this letter, we propose MGPose, a camera relative pose estimation method using a matching-guided dual-channel attention mechanism. For wide-baseline image pairs, MGPose effectively reduces interference from uncorrelated features through a feature matching strategy, utilizes camera motion prior knowledge to capture the relative motion characteristics of matched points, and employs a bidirectional channel cross-attention mechanism along with a channel self-attention mechanism to fully capture the interactions between different channels of matched points, enabling efficient feature fusion for the image pairs. Extensive experiments on Matterport3D and ScanNet show that MGPose outperforms or matches state-of-the-art methods in camera relative pose estimation.\",\"PeriodicalId\":13241,\"journal\":{\"name\":\"IEEE Robotics and Automation Letters\",\"volume\":\"10 12\",\"pages\":\"12293-12300\"},\"PeriodicalIF\":5.3000,\"publicationDate\":\"2025-10-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Robotics and Automation Letters\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/11204007/\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ROBOTICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Robotics and Automation Letters","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/11204007/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ROBOTICS","Score":null,"Total":0}
MGPose: Wide-Baseline Relative Camera Pose Estimation Using Matching-Guided Dual Channel-Attention
Relative camera pose estimation is a fundamental task in computer vision and robotics. In wide-baseline scenarios with limited visual overlap, traditional methods often perform poorly. Existing deep learning approaches are also hindered by irrelevant features and insufficient modeling of the relative motion between image pairs, making accurate pose estimation particularly challenging. In this letter, we propose MGPose, a camera relative pose estimation method using a matching-guided dual-channel attention mechanism. For wide-baseline image pairs, MGPose effectively reduces interference from uncorrelated features through a feature matching strategy, utilizes camera motion prior knowledge to capture the relative motion characteristics of matched points, and employs a bidirectional channel cross-attention mechanism along with a channel self-attention mechanism to fully capture the interactions between different channels of matched points, enabling efficient feature fusion for the image pairs. Extensive experiments on Matterport3D and ScanNet show that MGPose outperforms or matches state-of-the-art methods in camera relative pose estimation.
期刊介绍:
The scope of this journal is to publish peer-reviewed articles that provide a timely and concise account of innovative research ideas and application results, reporting significant theoretical findings and application case studies in areas of robotics and automation.