UrbanGen: Urban Generation with Compositional and Controllable Neural Fields.

IF 18.6

IEEE transactions on pattern analysis and machine intelligence Pub Date : 2025-08-19 DOI:10.1109/TPAMI.2025.3600440

Yuanbo Yang, Yujun Shen, Yue Wang, Andreas Geiger, Yiyi Liao

{"title":"UrbanGen: Urban Generation with Compositional and Controllable Neural Fields.","authors":"Yuanbo Yang, Yujun Shen, Yue Wang, Andreas Geiger, Yiyi Liao","doi":"10.1109/TPAMI.2025.3600440","DOIUrl":null,"url":null,"abstract":"<p><p>Despite the rapid progress in generative radiance fields, most existing methods focus on object-centric applications and are not able to generate complex urban scenes. In this paper, we propose UrbanGen, a solution for the challenging task of generating urban radiance fields with photorealistic rendering, accurate geometry, high controllability, and diverse city styles. Our key idea is to leverage a coarse 3D panoptic prior, represented by a semantic voxel grid for stuff and bounding boxes for countable objects, to condition a compositional generative radiance field. This panoptic prior simplifies the task of learning complex urban geometry, enables disentanglement of stuff and objects, and provides versatile control over both. Moreover, by combining semantic and geometry losses with adversarial training, our method faithfully adheres to the input conditions, allowing for joint rendering of semantic and depth maps alongside RGB images. In addition, we collect a unified dataset with images and their panoptic priors in the same format from 3 diverse real-world datasets: KITTI-360, nuScenes, and Waymo, and train a city style-aware model on this data. Our systematic study shows that UrbanGen outperforms state-of-the-art generative radiance field baselines in terms of image fidelity and geometry accuracy for urban scene generation. Furthermore, UrbenGen brings a new set of controllability features, including large camera movements, stuff editing, and city style control.</p>","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"PP ","pages":""},"PeriodicalIF":18.6000,"publicationDate":"2025-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on pattern analysis and machine intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/TPAMI.2025.3600440","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Despite the rapid progress in generative radiance fields, most existing methods focus on object-centric applications and are not able to generate complex urban scenes. In this paper, we propose UrbanGen, a solution for the challenging task of generating urban radiance fields with photorealistic rendering, accurate geometry, high controllability, and diverse city styles. Our key idea is to leverage a coarse 3D panoptic prior, represented by a semantic voxel grid for stuff and bounding boxes for countable objects, to condition a compositional generative radiance field. This panoptic prior simplifies the task of learning complex urban geometry, enables disentanglement of stuff and objects, and provides versatile control over both. Moreover, by combining semantic and geometry losses with adversarial training, our method faithfully adheres to the input conditions, allowing for joint rendering of semantic and depth maps alongside RGB images. In addition, we collect a unified dataset with images and their panoptic priors in the same format from 3 diverse real-world datasets: KITTI-360, nuScenes, and Waymo, and train a city style-aware model on this data. Our systematic study shows that UrbanGen outperforms state-of-the-art generative radiance field baselines in terms of image fidelity and geometry accuracy for urban scene generation. Furthermore, UrbenGen brings a new set of controllability features, including large camera movements, stuff editing, and city style control.

查看原文本刊更多论文

UrbanGen：具有组成和可控神经场的城市生成。

尽管生成辐射场的研究进展迅速，但大多数现有方法都侧重于以物体为中心的应用，无法生成复杂的城市场景。在本文中，我们提出了urban bangen，这是一个具有挑战性的任务，具有逼真的渲染，精确的几何形状，高度可控制性和多样化的城市风格。我们的关键思想是利用粗糙的3D全景先验，用语义体素网格表示物体，用边界框表示可数物体，来调节合成生成辐射场。这种全景先验简化了学习复杂城市几何的任务，使材料和物体能够解开纠缠，并提供对两者的通用控制。此外，通过将语义和几何损失与对抗训练相结合，我们的方法忠实地遵循输入条件，允许与RGB图像一起联合渲染语义和深度图。此外，我们从KITTI-360、nuScenes和Waymo 3个不同的现实世界数据集中收集了一个统一的数据集，其中包含相同格式的图像及其全景先验，并在此数据上训练了一个城市风格感知模型。我们的系统研究表明，UrbanGen在城市场景生成的图像保真度和几何精度方面优于最先进的生成辐光场基线。此外，UrbenGen带来了一系列新的可控性功能，包括大镜头移动、素材编辑和城市风格控制。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE transactions on pattern analysis and machine intelligence

自引率

0.00%

发文量