ReconX:利用视频扩散模型从稀疏视图重建任何场景

Fangfu Liu, Wenqiang Sun, Hanyang Wang, Yikai Wang, Haowen Sun, Junliang Ye, Jun Zhang, Yueqi Duan
{"title":"ReconX:利用视频扩散模型从稀疏视图重建任何场景","authors":"Fangfu Liu, Wenqiang Sun, Hanyang Wang, Yikai Wang, Haowen Sun, Junliang Ye, Jun Zhang, Yueqi Duan","doi":"arxiv-2408.16767","DOIUrl":null,"url":null,"abstract":"Advancements in 3D scene reconstruction have transformed 2D images from the\nreal world into 3D models, producing realistic 3D results from hundreds of\ninput photos. Despite great success in dense-view reconstruction scenarios,\nrendering a detailed scene from insufficient captured views is still an\nill-posed optimization problem, often resulting in artifacts and distortions in\nunseen areas. In this paper, we propose ReconX, a novel 3D scene reconstruction\nparadigm that reframes the ambiguous reconstruction challenge as a temporal\ngeneration task. The key insight is to unleash the strong generative prior of\nlarge pre-trained video diffusion models for sparse-view reconstruction.\nHowever, 3D view consistency struggles to be accurately preserved in directly\ngenerated video frames from pre-trained models. To address this, given limited\ninput views, the proposed ReconX first constructs a global point cloud and\nencodes it into a contextual space as the 3D structure condition. Guided by the\ncondition, the video diffusion model then synthesizes video frames that are\nboth detail-preserved and exhibit a high degree of 3D consistency, ensuring the\ncoherence of the scene from various perspectives. Finally, we recover the 3D\nscene from the generated video through a confidence-aware 3D Gaussian Splatting\noptimization scheme. Extensive experiments on various real-world datasets show\nthe superiority of our ReconX over state-of-the-art methods in terms of quality\nand generalizability.","PeriodicalId":501174,"journal":{"name":"arXiv - CS - Graphics","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"ReconX: Reconstruct Any Scene from Sparse Views with Video Diffusion Model\",\"authors\":\"Fangfu Liu, Wenqiang Sun, Hanyang Wang, Yikai Wang, Haowen Sun, Junliang Ye, Jun Zhang, Yueqi Duan\",\"doi\":\"arxiv-2408.16767\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Advancements in 3D scene reconstruction have transformed 2D images from the\\nreal world into 3D models, producing realistic 3D results from hundreds of\\ninput photos. Despite great success in dense-view reconstruction scenarios,\\nrendering a detailed scene from insufficient captured views is still an\\nill-posed optimization problem, often resulting in artifacts and distortions in\\nunseen areas. In this paper, we propose ReconX, a novel 3D scene reconstruction\\nparadigm that reframes the ambiguous reconstruction challenge as a temporal\\ngeneration task. The key insight is to unleash the strong generative prior of\\nlarge pre-trained video diffusion models for sparse-view reconstruction.\\nHowever, 3D view consistency struggles to be accurately preserved in directly\\ngenerated video frames from pre-trained models. To address this, given limited\\ninput views, the proposed ReconX first constructs a global point cloud and\\nencodes it into a contextual space as the 3D structure condition. Guided by the\\ncondition, the video diffusion model then synthesizes video frames that are\\nboth detail-preserved and exhibit a high degree of 3D consistency, ensuring the\\ncoherence of the scene from various perspectives. Finally, we recover the 3D\\nscene from the generated video through a confidence-aware 3D Gaussian Splatting\\noptimization scheme. Extensive experiments on various real-world datasets show\\nthe superiority of our ReconX over state-of-the-art methods in terms of quality\\nand generalizability.\",\"PeriodicalId\":501174,\"journal\":{\"name\":\"arXiv - CS - Graphics\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-08-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Graphics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2408.16767\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Graphics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.16767","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

三维场景重建技术的进步将现实世界中的二维图像转化为三维模型,从数百张输入照片中生成逼真的三维结果。尽管在密集视图重建场景中取得了巨大成功,但从捕捉到的不足视图中渲染出详细的场景仍然是一个难以解决的优化问题,往往会在不可见的区域造成伪影和失真。在本文中,我们提出了一种新颖的三维场景重建范式 ReconX,它将模糊重建挑战重构为时间生成任务。然而,在根据预训练模型直接生成的视频帧中,三维视图的一致性很难得到准确保留。为了解决这个问题,在输入视图有限的情况下,建议的 ReconX 首先构建一个全局点云,并将其编码到上下文空间作为三维结构条件。然后,在该条件的指导下,视频扩散模型合成既能保留细节又能表现出高度三维一致性的视频帧,从而确保从不同视角观察场景的一致性。最后,我们通过置信度感知的三维高斯拼接优化方案从生成的视频中恢复三维场景。在各种真实世界数据集上进行的广泛实验表明,我们的 ReconX 在质量和通用性方面都优于最先进的方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
ReconX: Reconstruct Any Scene from Sparse Views with Video Diffusion Model
Advancements in 3D scene reconstruction have transformed 2D images from the real world into 3D models, producing realistic 3D results from hundreds of input photos. Despite great success in dense-view reconstruction scenarios, rendering a detailed scene from insufficient captured views is still an ill-posed optimization problem, often resulting in artifacts and distortions in unseen areas. In this paper, we propose ReconX, a novel 3D scene reconstruction paradigm that reframes the ambiguous reconstruction challenge as a temporal generation task. The key insight is to unleash the strong generative prior of large pre-trained video diffusion models for sparse-view reconstruction. However, 3D view consistency struggles to be accurately preserved in directly generated video frames from pre-trained models. To address this, given limited input views, the proposed ReconX first constructs a global point cloud and encodes it into a contextual space as the 3D structure condition. Guided by the condition, the video diffusion model then synthesizes video frames that are both detail-preserved and exhibit a high degree of 3D consistency, ensuring the coherence of the scene from various perspectives. Finally, we recover the 3D scene from the generated video through a confidence-aware 3D Gaussian Splatting optimization scheme. Extensive experiments on various real-world datasets show the superiority of our ReconX over state-of-the-art methods in terms of quality and generalizability.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信