Hao Chen, Jiafu Wu, Ying Jin, Jinlong Peng, Xiaofeng Mao, Mingmin Chi, Mufeng Yao, Bo Peng, Jian Li, Yun Cao
{"title":"VI3DRM:通过逼真的新颖视图合成从稀疏视图实现细致的三维重建","authors":"Hao Chen, Jiafu Wu, Ying Jin, Jinlong Peng, Xiaofeng Mao, Mingmin Chi, Mufeng Yao, Bo Peng, Jian Li, Yun Cao","doi":"arxiv-2409.08207","DOIUrl":null,"url":null,"abstract":"Recently, methods like Zero-1-2-3 have focused on single-view based 3D\nreconstruction and have achieved remarkable success. However, their predictions\nfor unseen areas heavily rely on the inductive bias of large-scale pretrained\ndiffusion models. Although subsequent work, such as DreamComposer, attempts to\nmake predictions more controllable by incorporating additional views, the\nresults remain unrealistic due to feature entanglement in the vanilla latent\nspace, including factors such as lighting, material, and structure. To address\nthese issues, we introduce the Visual Isotropy 3D Reconstruction Model\n(VI3DRM), a diffusion-based sparse views 3D reconstruction model that operates\nwithin an ID consistent and perspective-disentangled 3D latent space. By\nfacilitating the disentanglement of semantic information, color, material\nproperties and lighting, VI3DRM is capable of generating highly realistic\nimages that are indistinguishable from real photographs. By leveraging both\nreal and synthesized images, our approach enables the accurate construction of\npointmaps, ultimately producing finely textured meshes or point clouds. On the\nNVS task, tested on the GSO dataset, VI3DRM significantly outperforms\nstate-of-the-art method DreamComposer, achieving a PSNR of 38.61, an SSIM of\n0.929, and an LPIPS of 0.027. Code will be made available upon publication.","PeriodicalId":501130,"journal":{"name":"arXiv - CS - Computer Vision and Pattern Recognition","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"VI3DRM:Towards meticulous 3D Reconstruction from Sparse Views via Photo-Realistic Novel View Synthesis\",\"authors\":\"Hao Chen, Jiafu Wu, Ying Jin, Jinlong Peng, Xiaofeng Mao, Mingmin Chi, Mufeng Yao, Bo Peng, Jian Li, Yun Cao\",\"doi\":\"arxiv-2409.08207\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Recently, methods like Zero-1-2-3 have focused on single-view based 3D\\nreconstruction and have achieved remarkable success. However, their predictions\\nfor unseen areas heavily rely on the inductive bias of large-scale pretrained\\ndiffusion models. Although subsequent work, such as DreamComposer, attempts to\\nmake predictions more controllable by incorporating additional views, the\\nresults remain unrealistic due to feature entanglement in the vanilla latent\\nspace, including factors such as lighting, material, and structure. To address\\nthese issues, we introduce the Visual Isotropy 3D Reconstruction Model\\n(VI3DRM), a diffusion-based sparse views 3D reconstruction model that operates\\nwithin an ID consistent and perspective-disentangled 3D latent space. By\\nfacilitating the disentanglement of semantic information, color, material\\nproperties and lighting, VI3DRM is capable of generating highly realistic\\nimages that are indistinguishable from real photographs. By leveraging both\\nreal and synthesized images, our approach enables the accurate construction of\\npointmaps, ultimately producing finely textured meshes or point clouds. On the\\nNVS task, tested on the GSO dataset, VI3DRM significantly outperforms\\nstate-of-the-art method DreamComposer, achieving a PSNR of 38.61, an SSIM of\\n0.929, and an LPIPS of 0.027. Code will be made available upon publication.\",\"PeriodicalId\":501130,\"journal\":{\"name\":\"arXiv - CS - Computer Vision and Pattern Recognition\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Computer Vision and Pattern Recognition\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.08207\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Computer Vision and Pattern Recognition","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.08207","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
摘要
最近,Zero-1-2-3 等方法专注于基于单视角的 3D 重建,并取得了显著的成功。然而,它们对未知区域的预测严重依赖于大规模预训练扩散模型的归纳偏差。尽管后来的工作(如 DreamComposer)试图通过加入额外视图来提高预测的可控性,但由于虚潜在空间中的特征纠缠(包括照明、材料和结构等因素),结果仍然不切实际。为了解决这些问题,我们引入了视觉各向同性三维重建模型(Visual Isotropy 3D Reconstruction Model,VI3DRM),这是一种基于扩散的稀疏视图三维重建模型,在 ID 一致且透视解散的三维潜空间中运行。通过促进语义信息、颜色、材料属性和光照的分离,VI3DRM 能够生成与真实照片无异的高度逼真的图像。通过同时利用真实图像和合成图像,我们的方法能够准确构建点阵图,最终生成纹理精细的网格或点云。在GSO数据集上测试的NVS任务中,VI3DRM明显优于最先进的DreamComposer方法,PSNR达到38.61,SSIM达到0.929,LPIPS达到0.027。代码将在发表后公布。
VI3DRM:Towards meticulous 3D Reconstruction from Sparse Views via Photo-Realistic Novel View Synthesis
Recently, methods like Zero-1-2-3 have focused on single-view based 3D
reconstruction and have achieved remarkable success. However, their predictions
for unseen areas heavily rely on the inductive bias of large-scale pretrained
diffusion models. Although subsequent work, such as DreamComposer, attempts to
make predictions more controllable by incorporating additional views, the
results remain unrealistic due to feature entanglement in the vanilla latent
space, including factors such as lighting, material, and structure. To address
these issues, we introduce the Visual Isotropy 3D Reconstruction Model
(VI3DRM), a diffusion-based sparse views 3D reconstruction model that operates
within an ID consistent and perspective-disentangled 3D latent space. By
facilitating the disentanglement of semantic information, color, material
properties and lighting, VI3DRM is capable of generating highly realistic
images that are indistinguishable from real photographs. By leveraging both
real and synthesized images, our approach enables the accurate construction of
pointmaps, ultimately producing finely textured meshes or point clouds. On the
NVS task, tested on the GSO dataset, VI3DRM significantly outperforms
state-of-the-art method DreamComposer, achieving a PSNR of 38.61, an SSIM of
0.929, and an LPIPS of 0.027. Code will be made available upon publication.