t图：通过对平移图增强稀疏视图相机姿态估计

IF 12.2 1区地球科学 Q1 GEOGRAPHY, PHYSICAL

ISPRS Journal of Photogrammetry and Remote Sensing Pub Date : 2025-09-17 DOI:10.1016/j.isprsjprs.2025.08.031

Qingyu Xian , Weiqin Jiao , Hao Cheng , Berend Jan van der Zwaag , Yanqiu Huang

{"title":"t图：通过对平移图增强稀疏视图相机姿态估计","authors":"Qingyu Xian , Weiqin Jiao , Hao Cheng , Berend Jan van der Zwaag , Yanqiu Huang","doi":"10.1016/j.isprsjprs.2025.08.031","DOIUrl":null,"url":null,"abstract":"<div><div>Sparse-view camera pose estimation, which aims to recover 6-Degree-of-Freedom (6-DoF) poses from a limited number of unordered multi-view images, is fundamental yet challenging in remote sensing. Learning-based methods offer greater robustness than traditional Structure-from-Motion (SfM) pipelines by leveraging dense high-dimensional features and implicit learning, rather than sparse keypoints and limited geometric constraints. However, they often neglect pairwise translation cues between views, resulting in suboptimal performance in sparse-view scenarios. To address this limitation, we introduce T-Graph, a lightweight, plug-and-play module to enhance camera pose estimation in sparse-view settings. T-graph takes paired image features as input and maps them through a Multilayer Perceptron (MLP). It then constructs a fully connected translation graph, where nodes represent cameras and edges encode their translation relationships. It can be seamlessly integrated into most existing learning-based models as an additional branch in parallel with the original prediction, maintaining efficiency and ease of use. Furthermore, we introduce two pairwise translation representations, <em>relative-t</em> and <em>pair-t</em>, formulated under different local coordinate systems. While <em>relative-t</em> captures intuitive spatial relationships, <em>pair-t</em> offers a rotation-disentangled alternative. The two representations contribute to enhanced adaptability across diverse application scenarios, further improving our module’s robustness. We further propose an indicator termed the Camera Axis Dispersion Ratio (CADR) to quantitatively assess which type of pairwise translation representation is better suited for a given camera configuration in a dataset. Extensive experiments on three representative methods (RelPose++, Forge and 8Pt-ViT) using public datasets (CO3D and IMC PhotoTourism) validate both the effectiveness and generalizability of T-Graph. The results demonstrate consistent improvements across various metrics, notably camera center accuracy, which improves up to 6% across 2 to 8 viewpoints.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"230 ","pages":"Pages 109-125"},"PeriodicalIF":12.2000,"publicationDate":"2025-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"T-graph: Enhancing sparse-view camera pose estimation by pairwise translation graph\",\"authors\":\"Qingyu Xian , Weiqin Jiao , Hao Cheng , Berend Jan van der Zwaag , Yanqiu Huang\",\"doi\":\"10.1016/j.isprsjprs.2025.08.031\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Sparse-view camera pose estimation, which aims to recover 6-Degree-of-Freedom (6-DoF) poses from a limited number of unordered multi-view images, is fundamental yet challenging in remote sensing. Learning-based methods offer greater robustness than traditional Structure-from-Motion (SfM) pipelines by leveraging dense high-dimensional features and implicit learning, rather than sparse keypoints and limited geometric constraints. However, they often neglect pairwise translation cues between views, resulting in suboptimal performance in sparse-view scenarios. To address this limitation, we introduce T-Graph, a lightweight, plug-and-play module to enhance camera pose estimation in sparse-view settings. T-graph takes paired image features as input and maps them through a Multilayer Perceptron (MLP). It then constructs a fully connected translation graph, where nodes represent cameras and edges encode their translation relationships. It can be seamlessly integrated into most existing learning-based models as an additional branch in parallel with the original prediction, maintaining efficiency and ease of use. Furthermore, we introduce two pairwise translation representations, <em>relative-t</em> and <em>pair-t</em>, formulated under different local coordinate systems. While <em>relative-t</em> captures intuitive spatial relationships, <em>pair-t</em> offers a rotation-disentangled alternative. The two representations contribute to enhanced adaptability across diverse application scenarios, further improving our module’s robustness. We further propose an indicator termed the Camera Axis Dispersion Ratio (CADR) to quantitatively assess which type of pairwise translation representation is better suited for a given camera configuration in a dataset. Extensive experiments on three representative methods (RelPose++, Forge and 8Pt-ViT) using public datasets (CO3D and IMC PhotoTourism) validate both the effectiveness and generalizability of T-Graph. The results demonstrate consistent improvements across various metrics, notably camera center accuracy, which improves up to 6% across 2 to 8 viewpoints.</div></div>\",\"PeriodicalId\":50269,\"journal\":{\"name\":\"ISPRS Journal of Photogrammetry and Remote Sensing\",\"volume\":\"230 \",\"pages\":\"Pages 109-125\"},\"PeriodicalIF\":12.2000,\"publicationDate\":\"2025-09-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ISPRS Journal of Photogrammetry and Remote Sensing\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S092427162500348X\",\"RegionNum\":1,\"RegionCategory\":\"地球科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"GEOGRAPHY, PHYSICAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ISPRS Journal of Photogrammetry and Remote Sensing","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S092427162500348X","RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"GEOGRAPHY, PHYSICAL","Score":null,"Total":0}

引用次数: 0

摘要

稀疏视图相机姿态估计旨在从有限数量的无序多视图图像中恢复6个自由度（6-DoF）的姿态，这是遥感的基础但也是具有挑战性的。基于学习的方法通过利用密集的高维特征和隐式学习，而不是稀疏的关键点和有限的几何约束，比传统的运动结构（SfM）管道具有更强的鲁棒性。然而，它们经常忽略视图之间的成对转换提示，导致在稀疏视图场景中性能欠佳。为了解决这一限制，我们引入了T-Graph，一个轻量级的即插即用模块，以增强稀疏视图设置中的相机姿态估计。T-graph将成对的图像特征作为输入，并通过多层感知器（MLP）对其进行映射。然后构建一个完全连接的翻译图，其中节点表示相机，边缘编码它们的翻译关系。它可以无缝地集成到大多数现有的基于学习的模型中，作为与原始预测并行的额外分支，保持效率和易用性。此外，我们还引入了在不同局部坐标系下表述的两种成对平移表示，relative-t和pair-t。相对t捕捉了直观的空间关系，而成对t则提供了一个旋转解纠缠的替代方案。这两种表示有助于增强跨不同应用程序场景的适应性，进一步提高模块的健壮性。我们进一步提出了一个称为相机轴色散比（CADR）的指标，用于定量评估哪种类型的成对翻译表示更适合数据集中给定的相机配置。利用公共数据集（CO3D和IMC PhotoTourism）对三种具有代表性的方法（RelPose++、Forge和8Pt-ViT）进行了大量实验，验证了T-Graph的有效性和泛化性。结果显示，在各种指标上都有一致的改进，特别是相机中心精度，在2到8个视点上提高了6%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

T-graph: Enhancing sparse-view camera pose estimation by pairwise translation graph

Sparse-view camera pose estimation, which aims to recover 6-Degree-of-Freedom (6-DoF) poses from a limited number of unordered multi-view images, is fundamental yet challenging in remote sensing. Learning-based methods offer greater robustness than traditional Structure-from-Motion (SfM) pipelines by leveraging dense high-dimensional features and implicit learning, rather than sparse keypoints and limited geometric constraints. However, they often neglect pairwise translation cues between views, resulting in suboptimal performance in sparse-view scenarios. To address this limitation, we introduce T-Graph, a lightweight, plug-and-play module to enhance camera pose estimation in sparse-view settings. T-graph takes paired image features as input and maps them through a Multilayer Perceptron (MLP). It then constructs a fully connected translation graph, where nodes represent cameras and edges encode their translation relationships. It can be seamlessly integrated into most existing learning-based models as an additional branch in parallel with the original prediction, maintaining efficiency and ease of use. Furthermore, we introduce two pairwise translation representations, relative-t and pair-t, formulated under different local coordinate systems. While relative-t captures intuitive spatial relationships, pair-t offers a rotation-disentangled alternative. The two representations contribute to enhanced adaptability across diverse application scenarios, further improving our module’s robustness. We further propose an indicator termed the Camera Axis Dispersion Ratio (CADR) to quantitatively assess which type of pairwise translation representation is better suited for a given camera configuration in a dataset. Extensive experiments on three representative methods (RelPose++, Forge and 8Pt-ViT) using public datasets (CO3D and IMC PhotoTourism) validate both the effectiveness and generalizability of T-Graph. The results demonstrate consistent improvements across various metrics, notably camera center accuracy, which improves up to 6% across 2 to 8 viewpoints.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

ISPRS Journal of Photogrammetry and Remote Sensing 工程技术-成像科学与照相技术

CiteScore

21.00

自引率

6.30%

发文量

273

审稿时长

40 days

期刊介绍： The ISPRS Journal of Photogrammetry and Remote Sensing (P&RS) serves as the official journal of the International Society for Photogrammetry and Remote Sensing (ISPRS). It acts as a platform for scientists and professionals worldwide who are involved in various disciplines that utilize photogrammetry, remote sensing, spatial information systems, computer vision, and related fields. The journal aims to facilitate communication and dissemination of advancements in these disciplines, while also acting as a comprehensive source of reference and archive. P&RS endeavors to publish high-quality, peer-reviewed research papers that are preferably original and have not been published before. These papers can cover scientific/research, technological development, or application/practical aspects. Additionally, the journal welcomes papers that are based on presentations from ISPRS meetings, as long as they are considered significant contributions to the aforementioned fields. In particular, P&RS encourages the submission of papers that are of broad scientific interest, showcase innovative applications (especially in emerging fields), have an interdisciplinary focus, discuss topics that have received limited attention in P&RS or related journals, or explore new directions in scientific or professional realms. It is preferred that theoretical papers include practical applications, while papers focusing on systems and applications should include a theoretical background.