{"title":"LT3SD: Latent Trees for 3D Scene Diffusion","authors":"Quan Meng, Lei Li, Matthias Nießner, Angela Dai","doi":"arxiv-2409.08215","DOIUrl":null,"url":null,"abstract":"We present LT3SD, a novel latent diffusion model for large-scale 3D scene\ngeneration. Recent advances in diffusion models have shown impressive results\nin 3D object generation, but are limited in spatial extent and quality when\nextended to 3D scenes. To generate complex and diverse 3D scene structures, we\nintroduce a latent tree representation to effectively encode both\nlower-frequency geometry and higher-frequency detail in a coarse-to-fine\nhierarchy. We can then learn a generative diffusion process in this latent 3D\nscene space, modeling the latent components of a scene at each resolution\nlevel. To synthesize large-scale scenes with varying sizes, we train our\ndiffusion model on scene patches and synthesize arbitrary-sized output 3D\nscenes through shared diffusion generation across multiple scene patches.\nThrough extensive experiments, we demonstrate the efficacy and benefits of\nLT3SD for large-scale, high-quality unconditional 3D scene generation and for\nprobabilistic completion for partial scene observations.","PeriodicalId":501130,"journal":{"name":"arXiv - CS - Computer Vision and Pattern Recognition","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Computer Vision and Pattern Recognition","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.08215","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
We present LT3SD, a novel latent diffusion model for large-scale 3D scene
generation. Recent advances in diffusion models have shown impressive results
in 3D object generation, but are limited in spatial extent and quality when
extended to 3D scenes. To generate complex and diverse 3D scene structures, we
introduce a latent tree representation to effectively encode both
lower-frequency geometry and higher-frequency detail in a coarse-to-fine
hierarchy. We can then learn a generative diffusion process in this latent 3D
scene space, modeling the latent components of a scene at each resolution
level. To synthesize large-scale scenes with varying sizes, we train our
diffusion model on scene patches and synthesize arbitrary-sized output 3D
scenes through shared diffusion generation across multiple scene patches.
Through extensive experiments, we demonstrate the efficacy and benefits of
LT3SD for large-scale, high-quality unconditional 3D scene generation and for
probabilistic completion for partial scene observations.