一般对象作为姿态探针的少镜头视图合成

IF 11.1 1区 工程技术 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC
Zhirui Gao;Renjiao Yi;Chenyang Zhu;Ke Zhuang;Wei Chen;Kai Xu
{"title":"一般对象作为姿态探针的少镜头视图合成","authors":"Zhirui Gao;Renjiao Yi;Chenyang Zhu;Ke Zhuang;Wei Chen;Kai Xu","doi":"10.1109/TCSVT.2025.3551303","DOIUrl":null,"url":null,"abstract":"Radiance fields, including NeRFs and 3D Gaussians, demonstrate great potential in high-fidelity rendering and scene reconstruction, while they require a substantial number of posed images as input. COLMAP is frequently employed for preprocessing to estimate poses. However, COLMAP necessitates a large number of feature matches to operate effectively, and struggles with scenes characterized by sparse features, large baselines, or few-view images. We aim to tackle few-view NeRF reconstruction using only 3 to 6 unposed scene images, freeing from COLMAP initializations. Inspired by the idea of calibration boards in traditional pose calibration, we propose a novel approach of utilizing everyday objects, commonly found in both images and real life, as “pose probes”. By initializing the probe object as a cube shape, we apply a dual-branch volume rendering optimization (object NeRF and scene NeRF) to constrain the pose optimization and jointly refine the geometry. PnP matching is used to initialize poses between images incrementally, where only a few feature matches are enough. PoseProbe achieves state-of-the-art performance in pose estimation and novel view synthesis across multiple datasets in experiments. We demonstrate its effectiveness, particularly in few-view and large-baseline scenes where COLMAP struggles. In ablations, using different objects in a scene yields comparable performance, showing that PoseProbe is robust to the choice of probe objects. Our project page is available at: <uri>https://zhirui-gao.github.io/PoseProbe.github.io/</uri>","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 9","pages":"9046-9059"},"PeriodicalIF":11.1000,"publicationDate":"2025-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Generic Objects as Pose Probes for Few-Shot View Synthesis\",\"authors\":\"Zhirui Gao;Renjiao Yi;Chenyang Zhu;Ke Zhuang;Wei Chen;Kai Xu\",\"doi\":\"10.1109/TCSVT.2025.3551303\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Radiance fields, including NeRFs and 3D Gaussians, demonstrate great potential in high-fidelity rendering and scene reconstruction, while they require a substantial number of posed images as input. COLMAP is frequently employed for preprocessing to estimate poses. However, COLMAP necessitates a large number of feature matches to operate effectively, and struggles with scenes characterized by sparse features, large baselines, or few-view images. We aim to tackle few-view NeRF reconstruction using only 3 to 6 unposed scene images, freeing from COLMAP initializations. Inspired by the idea of calibration boards in traditional pose calibration, we propose a novel approach of utilizing everyday objects, commonly found in both images and real life, as “pose probes”. By initializing the probe object as a cube shape, we apply a dual-branch volume rendering optimization (object NeRF and scene NeRF) to constrain the pose optimization and jointly refine the geometry. PnP matching is used to initialize poses between images incrementally, where only a few feature matches are enough. PoseProbe achieves state-of-the-art performance in pose estimation and novel view synthesis across multiple datasets in experiments. We demonstrate its effectiveness, particularly in few-view and large-baseline scenes where COLMAP struggles. In ablations, using different objects in a scene yields comparable performance, showing that PoseProbe is robust to the choice of probe objects. Our project page is available at: <uri>https://zhirui-gao.github.io/PoseProbe.github.io/</uri>\",\"PeriodicalId\":13082,\"journal\":{\"name\":\"IEEE Transactions on Circuits and Systems for Video Technology\",\"volume\":\"35 9\",\"pages\":\"9046-9059\"},\"PeriodicalIF\":11.1000,\"publicationDate\":\"2025-03-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Circuits and Systems for Video Technology\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10926495/\",\"RegionNum\":1,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Circuits and Systems for Video Technology","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10926495/","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0

摘要

辐射场,包括nerf和3D高斯场,在高保真渲染和场景重建方面显示出巨大的潜力,尽管它们需要大量的摆姿势图像作为输入。COLMAP常用于预估姿态的预处理。然而,COLMAP需要大量的特征匹配才能有效地运行,并且难以处理以稀疏特征、大基线或少视图图像为特征的场景。我们的目标是只使用3到6个未放置的场景图像来处理少视图NeRF重建,从而摆脱COLMAP初始化。受传统姿势校准中校准板思想的启发,我们提出了一种利用图像和现实生活中常见的日常物体作为“姿势探针”的新方法。通过将探测对象初始化为立方体形状,我们应用双分支体渲染优化(对象NeRF和场景NeRF)来约束姿态优化并共同优化几何形状。PnP匹配用于逐步初始化图像之间的姿态,其中只有少量的特征匹配就足够了。在实验中,PoseProbe在姿态估计和跨多个数据集的新颖视图合成方面实现了最先进的性能。我们证明了它的有效性,特别是在COLMAP挣扎的少数视图和大基线场景中。在烧蚀中,在场景中使用不同的对象会产生相当的性能,这表明PoseProbe对探测对象的选择具有鲁棒性。我们的项目页面可访问:https://zhirui-gao.github.io/PoseProbe.github.io/
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Generic Objects as Pose Probes for Few-Shot View Synthesis
Radiance fields, including NeRFs and 3D Gaussians, demonstrate great potential in high-fidelity rendering and scene reconstruction, while they require a substantial number of posed images as input. COLMAP is frequently employed for preprocessing to estimate poses. However, COLMAP necessitates a large number of feature matches to operate effectively, and struggles with scenes characterized by sparse features, large baselines, or few-view images. We aim to tackle few-view NeRF reconstruction using only 3 to 6 unposed scene images, freeing from COLMAP initializations. Inspired by the idea of calibration boards in traditional pose calibration, we propose a novel approach of utilizing everyday objects, commonly found in both images and real life, as “pose probes”. By initializing the probe object as a cube shape, we apply a dual-branch volume rendering optimization (object NeRF and scene NeRF) to constrain the pose optimization and jointly refine the geometry. PnP matching is used to initialize poses between images incrementally, where only a few feature matches are enough. PoseProbe achieves state-of-the-art performance in pose estimation and novel view synthesis across multiple datasets in experiments. We demonstrate its effectiveness, particularly in few-view and large-baseline scenes where COLMAP struggles. In ablations, using different objects in a scene yields comparable performance, showing that PoseProbe is robust to the choice of probe objects. Our project page is available at: https://zhirui-gao.github.io/PoseProbe.github.io/
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
13.80
自引率
27.40%
发文量
660
审稿时长
5 months
期刊介绍: The IEEE Transactions on Circuits and Systems for Video Technology (TCSVT) is dedicated to covering all aspects of video technologies from a circuits and systems perspective. We encourage submissions of general, theoretical, and application-oriented papers related to image and video acquisition, representation, presentation, and display. Additionally, we welcome contributions in areas such as processing, filtering, and transforms; analysis and synthesis; learning and understanding; compression, transmission, communication, and networking; as well as storage, retrieval, indexing, and search. Furthermore, papers focusing on hardware and software design and implementation are highly valued. Join us in advancing the field of video technology through innovative research and insights.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信