一般对象作为姿态探针的少镜头视图合成

IF 11.1 1区工程技术 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE Transactions on Circuits and Systems for Video Technology Pub Date : 2025-03-14 DOI:10.1109/TCSVT.2025.3551303

Zhirui Gao;Renjiao Yi;Chenyang Zhu;Ke Zhuang;Wei Chen;Kai Xu

{"title":"一般对象作为姿态探针的少镜头视图合成","authors":"Zhirui Gao;Renjiao Yi;Chenyang Zhu;Ke Zhuang;Wei Chen;Kai Xu","doi":"10.1109/TCSVT.2025.3551303","DOIUrl":null,"url":null,"abstract":"Radiance fields, including NeRFs and 3D Gaussians, demonstrate great potential in high-fidelity rendering and scene reconstruction, while they require a substantial number of posed images as input. COLMAP is frequently employed for preprocessing to estimate poses. However, COLMAP necessitates a large number of feature matches to operate effectively, and struggles with scenes characterized by sparse features, large baselines, or few-view images. We aim to tackle few-view NeRF reconstruction using only 3 to 6 unposed scene images, freeing from COLMAP initializations. Inspired by the idea of calibration boards in traditional pose calibration, we propose a novel approach of utilizing everyday objects, commonly found in both images and real life, as “pose probes”. By initializing the probe object as a cube shape, we apply a dual-branch volume rendering optimization (object NeRF and scene NeRF) to constrain the pose optimization and jointly refine the geometry. PnP matching is used to initialize poses between images incrementally, where only a few feature matches are enough. PoseProbe achieves state-of-the-art performance in pose estimation and novel view synthesis across multiple datasets in experiments. We demonstrate its effectiveness, particularly in few-view and large-baseline scenes where COLMAP struggles. In ablations, using different objects in a scene yields comparable performance, showing that PoseProbe is robust to the choice of probe objects. Our project page is available at: <uri>https://zhirui-gao.github.io/PoseProbe.github.io/</uri>","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 9","pages":"9046-9059"},"PeriodicalIF":11.1000,"publicationDate":"2025-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Generic Objects as Pose Probes for Few-Shot View Synthesis\",\"authors\":\"Zhirui Gao;Renjiao Yi;Chenyang Zhu;Ke Zhuang;Wei Chen;Kai Xu\",\"doi\":\"10.1109/TCSVT.2025.3551303\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Radiance fields, including NeRFs and 3D Gaussians, demonstrate great potential in high-fidelity rendering and scene reconstruction, while they require a substantial number of posed images as input. COLMAP is frequently employed for preprocessing to estimate poses. However, COLMAP necessitates a large number of feature matches to operate effectively, and struggles with scenes characterized by sparse features, large baselines, or few-view images. We aim to tackle few-view NeRF reconstruction using only 3 to 6 unposed scene images, freeing from COLMAP initializations. Inspired by the idea of calibration boards in traditional pose calibration, we propose a novel approach of utilizing everyday objects, commonly found in both images and real life, as “pose probes”. By initializing the probe object as a cube shape, we apply a dual-branch volume rendering optimization (object NeRF and scene NeRF) to constrain the pose optimization and jointly refine the geometry. PnP matching is used to initialize poses between images incrementally, where only a few feature matches are enough. PoseProbe achieves state-of-the-art performance in pose estimation and novel view synthesis across multiple datasets in experiments. We demonstrate its effectiveness, particularly in few-view and large-baseline scenes where COLMAP struggles. In ablations, using different objects in a scene yields comparable performance, showing that PoseProbe is robust to the choice of probe objects. Our project page is available at: <uri>https://zhirui-gao.github.io/PoseProbe.github.io/</uri>\",\"PeriodicalId\":13082,\"journal\":{\"name\":\"IEEE Transactions on Circuits and Systems for Video Technology\",\"volume\":\"35 9\",\"pages\":\"9046-9059\"},\"PeriodicalIF\":11.1000,\"publicationDate\":\"2025-03-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Circuits and Systems for Video Technology\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10926495/\",\"RegionNum\":1,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Circuits and Systems for Video Technology","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10926495/","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

摘要

辐射场，包括nerf和3D高斯场，在高保真渲染和场景重建方面显示出巨大的潜力，尽管它们需要大量的摆姿势图像作为输入。COLMAP常用于预估姿态的预处理。然而，COLMAP需要大量的特征匹配才能有效地运行，并且难以处理以稀疏特征、大基线或少视图图像为特征的场景。我们的目标是只使用3到6个未放置的场景图像来处理少视图NeRF重建，从而摆脱COLMAP初始化。受传统姿势校准中校准板思想的启发，我们提出了一种利用图像和现实生活中常见的日常物体作为“姿势探针”的新方法。通过将探测对象初始化为立方体形状，我们应用双分支体渲染优化（对象NeRF和场景NeRF）来约束姿态优化并共同优化几何形状。PnP匹配用于逐步初始化图像之间的姿态，其中只有少量的特征匹配就足够了。在实验中，PoseProbe在姿态估计和跨多个数据集的新颖视图合成方面实现了最先进的性能。我们证明了它的有效性，特别是在COLMAP挣扎的少数视图和大基线场景中。在烧蚀中，在场景中使用不同的对象会产生相当的性能，这表明PoseProbe对探测对象的选择具有鲁棒性。我们的项目页面可访问：https://zhirui-gao.github.io/PoseProbe.github.io/

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Generic Objects as Pose Probes for Few-Shot View Synthesis

Radiance fields, including NeRFs and 3D Gaussians, demonstrate great potential in high-fidelity rendering and scene reconstruction, while they require a substantial number of posed images as input. COLMAP is frequently employed for preprocessing to estimate poses. However, COLMAP necessitates a large number of feature matches to operate effectively, and struggles with scenes characterized by sparse features, large baselines, or few-view images. We aim to tackle few-view NeRF reconstruction using only 3 to 6 unposed scene images, freeing from COLMAP initializations. Inspired by the idea of calibration boards in traditional pose calibration, we propose a novel approach of utilizing everyday objects, commonly found in both images and real life, as “pose probes”. By initializing the probe object as a cube shape, we apply a dual-branch volume rendering optimization (object NeRF and scene NeRF) to constrain the pose optimization and jointly refine the geometry. PnP matching is used to initialize poses between images incrementally, where only a few feature matches are enough. PoseProbe achieves state-of-the-art performance in pose estimation and novel view synthesis across multiple datasets in experiments. We demonstrate its effectiveness, particularly in few-view and large-baseline scenes where COLMAP struggles. In ablations, using different objects in a scene yields comparable performance, showing that PoseProbe is robust to the choice of probe objects. Our project page is available at: https://zhirui-gao.github.io/PoseProbe.github.io/

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Transactions on Circuits and Systems for Video Technology 工程技术-工程：电子与电气

CiteScore

13.80

自引率

27.40%

发文量

660

审稿时长

5 months

期刊介绍： The IEEE Transactions on Circuits and Systems for Video Technology (TCSVT) is dedicated to covering all aspects of video technologies from a circuits and systems perspective. We encourage submissions of general, theoretical, and application-oriented papers related to image and video acquisition, representation, presentation, and display. Additionally, we welcome contributions in areas such as processing, filtering, and transforms; analysis and synthesis; learning and understanding; compression, transmission, communication, and networking; as well as storage, retrieval, indexing, and search. Furthermore, papers focusing on hardware and software design and implementation are highly valued. Join us in advancing the field of video technology through innovative research and insights.