Text-to-3D scene generation framework: bridging textual descriptions to high-fidelity 3D scenes.

IF 6 4区 计算机科学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS
Zuan Gu, Tianhan Gao, Huimin Liu
{"title":"Text-to-3D scene generation framework: bridging textual descriptions to high-fidelity 3D scenes.","authors":"Zuan Gu, Tianhan Gao, Huimin Liu","doi":"10.1186/s42492-025-00210-0","DOIUrl":null,"url":null,"abstract":"<p><p>Text-to-3D scene generation is pivotal for digital content creation; however, existing methods often struggle with global consistency across views. We present 3DS-Gen, a modular \"generate-then-reconstruct\" framework that first produces a temporally coherent multi-view video prior and then reconstructs consistent 3D scenes using sparse geometry estimation and Gaussian optimization. A cascaded variational autoencoder (2D for spatial compression and 3D for temporal compression) provides a compact and coherent latent sequence that facilitates robust reconstruction. An adaptive density threshold improves detailed allocation in the Gaussian stage under a fixed computational budget. While explicit meshes can be extracted from the optimized representation when needed, our claims emphasize multiview consistency and reconstructability; the mesh quality depends on the video prior and the chosen explicitification backend. 3DS-Gen runs on a single GPU and yields coherent scene reconstructions across diverse prompts, thereby providing a practical bridge between text and 3D content creation.</p>","PeriodicalId":29931,"journal":{"name":"Visual Computing for Industry Biomedicine and Art","volume":"8 1","pages":"29"},"PeriodicalIF":6.0000,"publicationDate":"2025-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12712286/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Visual Computing for Industry Biomedicine and Art","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1186/s42492-025-00210-0","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0

Abstract

Text-to-3D scene generation is pivotal for digital content creation; however, existing methods often struggle with global consistency across views. We present 3DS-Gen, a modular "generate-then-reconstruct" framework that first produces a temporally coherent multi-view video prior and then reconstructs consistent 3D scenes using sparse geometry estimation and Gaussian optimization. A cascaded variational autoencoder (2D for spatial compression and 3D for temporal compression) provides a compact and coherent latent sequence that facilitates robust reconstruction. An adaptive density threshold improves detailed allocation in the Gaussian stage under a fixed computational budget. While explicit meshes can be extracted from the optimized representation when needed, our claims emphasize multiview consistency and reconstructability; the mesh quality depends on the video prior and the chosen explicitification backend. 3DS-Gen runs on a single GPU and yields coherent scene reconstructions across diverse prompts, thereby providing a practical bridge between text and 3D content creation.

文本到3D场景生成框架:桥接文本描述到高保真3D场景。
文本到3d场景生成是数字内容创作的关键;然而,现有的方法经常与跨视图的全局一致性作斗争。我们提出了3DS-Gen,一个模块化的“生成-然后重建”框架,首先产生一个暂时连贯的多视图视频,然后使用稀疏几何估计和高斯优化重建一致的3D场景。级联变分自编码器(2D用于空间压缩,3D用于时间压缩)提供紧凑一致的潜在序列,促进鲁棒重建。在固定的计算预算下,自适应密度阈值改善了高斯阶段的详细分配。虽然可以在需要时从优化的表示中提取显式网格,但我们的要求强调多视图一致性和可重构性;网格质量取决于视频先验和选择的显式后端。3DS-Gen在单个GPU上运行,并在不同的提示中产生连贯的场景重建,从而在文本和3D内容创建之间提供实用的桥梁。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
5.60
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信
小红书