Qingyu Xian , Weiqin Jiao , Hao Cheng , Berend Jan van der Zwaag , Yanqiu Huang
{"title":"t图:通过对平移图增强稀疏视图相机姿态估计","authors":"Qingyu Xian , Weiqin Jiao , Hao Cheng , Berend Jan van der Zwaag , Yanqiu Huang","doi":"10.1016/j.isprsjprs.2025.08.031","DOIUrl":null,"url":null,"abstract":"<div><div>Sparse-view camera pose estimation, which aims to recover 6-Degree-of-Freedom (6-DoF) poses from a limited number of unordered multi-view images, is fundamental yet challenging in remote sensing. Learning-based methods offer greater robustness than traditional Structure-from-Motion (SfM) pipelines by leveraging dense high-dimensional features and implicit learning, rather than sparse keypoints and limited geometric constraints. However, they often neglect pairwise translation cues between views, resulting in suboptimal performance in sparse-view scenarios. To address this limitation, we introduce T-Graph, a lightweight, plug-and-play module to enhance camera pose estimation in sparse-view settings. T-graph takes paired image features as input and maps them through a Multilayer Perceptron (MLP). It then constructs a fully connected translation graph, where nodes represent cameras and edges encode their translation relationships. It can be seamlessly integrated into most existing learning-based models as an additional branch in parallel with the original prediction, maintaining efficiency and ease of use. Furthermore, we introduce two pairwise translation representations, <em>relative-t</em> and <em>pair-t</em>, formulated under different local coordinate systems. While <em>relative-t</em> captures intuitive spatial relationships, <em>pair-t</em> offers a rotation-disentangled alternative. The two representations contribute to enhanced adaptability across diverse application scenarios, further improving our module’s robustness. We further propose an indicator termed the Camera Axis Dispersion Ratio (CADR) to quantitatively assess which type of pairwise translation representation is better suited for a given camera configuration in a dataset. Extensive experiments on three representative methods (RelPose++, Forge and 8Pt-ViT) using public datasets (CO3D and IMC PhotoTourism) validate both the effectiveness and generalizability of T-Graph. The results demonstrate consistent improvements across various metrics, notably camera center accuracy, which improves up to 6% across 2 to 8 viewpoints.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"230 ","pages":"Pages 109-125"},"PeriodicalIF":12.2000,"publicationDate":"2025-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"T-graph: Enhancing sparse-view camera pose estimation by pairwise translation graph\",\"authors\":\"Qingyu Xian , Weiqin Jiao , Hao Cheng , Berend Jan van der Zwaag , Yanqiu Huang\",\"doi\":\"10.1016/j.isprsjprs.2025.08.031\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Sparse-view camera pose estimation, which aims to recover 6-Degree-of-Freedom (6-DoF) poses from a limited number of unordered multi-view images, is fundamental yet challenging in remote sensing. Learning-based methods offer greater robustness than traditional Structure-from-Motion (SfM) pipelines by leveraging dense high-dimensional features and implicit learning, rather than sparse keypoints and limited geometric constraints. However, they often neglect pairwise translation cues between views, resulting in suboptimal performance in sparse-view scenarios. To address this limitation, we introduce T-Graph, a lightweight, plug-and-play module to enhance camera pose estimation in sparse-view settings. T-graph takes paired image features as input and maps them through a Multilayer Perceptron (MLP). It then constructs a fully connected translation graph, where nodes represent cameras and edges encode their translation relationships. It can be seamlessly integrated into most existing learning-based models as an additional branch in parallel with the original prediction, maintaining efficiency and ease of use. Furthermore, we introduce two pairwise translation representations, <em>relative-t</em> and <em>pair-t</em>, formulated under different local coordinate systems. While <em>relative-t</em> captures intuitive spatial relationships, <em>pair-t</em> offers a rotation-disentangled alternative. The two representations contribute to enhanced adaptability across diverse application scenarios, further improving our module’s robustness. We further propose an indicator termed the Camera Axis Dispersion Ratio (CADR) to quantitatively assess which type of pairwise translation representation is better suited for a given camera configuration in a dataset. Extensive experiments on three representative methods (RelPose++, Forge and 8Pt-ViT) using public datasets (CO3D and IMC PhotoTourism) validate both the effectiveness and generalizability of T-Graph. The results demonstrate consistent improvements across various metrics, notably camera center accuracy, which improves up to 6% across 2 to 8 viewpoints.</div></div>\",\"PeriodicalId\":50269,\"journal\":{\"name\":\"ISPRS Journal of Photogrammetry and Remote Sensing\",\"volume\":\"230 \",\"pages\":\"Pages 109-125\"},\"PeriodicalIF\":12.2000,\"publicationDate\":\"2025-09-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ISPRS Journal of Photogrammetry and Remote Sensing\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S092427162500348X\",\"RegionNum\":1,\"RegionCategory\":\"地球科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"GEOGRAPHY, PHYSICAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ISPRS Journal of Photogrammetry and Remote Sensing","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S092427162500348X","RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"GEOGRAPHY, PHYSICAL","Score":null,"Total":0}
T-graph: Enhancing sparse-view camera pose estimation by pairwise translation graph
Sparse-view camera pose estimation, which aims to recover 6-Degree-of-Freedom (6-DoF) poses from a limited number of unordered multi-view images, is fundamental yet challenging in remote sensing. Learning-based methods offer greater robustness than traditional Structure-from-Motion (SfM) pipelines by leveraging dense high-dimensional features and implicit learning, rather than sparse keypoints and limited geometric constraints. However, they often neglect pairwise translation cues between views, resulting in suboptimal performance in sparse-view scenarios. To address this limitation, we introduce T-Graph, a lightweight, plug-and-play module to enhance camera pose estimation in sparse-view settings. T-graph takes paired image features as input and maps them through a Multilayer Perceptron (MLP). It then constructs a fully connected translation graph, where nodes represent cameras and edges encode their translation relationships. It can be seamlessly integrated into most existing learning-based models as an additional branch in parallel with the original prediction, maintaining efficiency and ease of use. Furthermore, we introduce two pairwise translation representations, relative-t and pair-t, formulated under different local coordinate systems. While relative-t captures intuitive spatial relationships, pair-t offers a rotation-disentangled alternative. The two representations contribute to enhanced adaptability across diverse application scenarios, further improving our module’s robustness. We further propose an indicator termed the Camera Axis Dispersion Ratio (CADR) to quantitatively assess which type of pairwise translation representation is better suited for a given camera configuration in a dataset. Extensive experiments on three representative methods (RelPose++, Forge and 8Pt-ViT) using public datasets (CO3D and IMC PhotoTourism) validate both the effectiveness and generalizability of T-Graph. The results demonstrate consistent improvements across various metrics, notably camera center accuracy, which improves up to 6% across 2 to 8 viewpoints.
期刊介绍:
The ISPRS Journal of Photogrammetry and Remote Sensing (P&RS) serves as the official journal of the International Society for Photogrammetry and Remote Sensing (ISPRS). It acts as a platform for scientists and professionals worldwide who are involved in various disciplines that utilize photogrammetry, remote sensing, spatial information systems, computer vision, and related fields. The journal aims to facilitate communication and dissemination of advancements in these disciplines, while also acting as a comprehensive source of reference and archive.
P&RS endeavors to publish high-quality, peer-reviewed research papers that are preferably original and have not been published before. These papers can cover scientific/research, technological development, or application/practical aspects. Additionally, the journal welcomes papers that are based on presentations from ISPRS meetings, as long as they are considered significant contributions to the aforementioned fields.
In particular, P&RS encourages the submission of papers that are of broad scientific interest, showcase innovative applications (especially in emerging fields), have an interdisciplinary focus, discuss topics that have received limited attention in P&RS or related journals, or explore new directions in scientific or professional realms. It is preferred that theoretical papers include practical applications, while papers focusing on systems and applications should include a theoretical background.