CtrlNeRF: The generative neural radiation fields for the controllable synthesis of high-fidelity 3D-aware images

IF 2.5 4区计算机科学 Q2 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Computers & Graphics-Uk Pub Date : 2025-02-01 DOI:10.1016/j.cag.2025.104163

Jian Liu , Zhen Yu

{"title":"CtrlNeRF: The generative neural radiation fields for the controllable synthesis of high-fidelity 3D-aware images","authors":"Jian Liu , Zhen Yu","doi":"10.1016/j.cag.2025.104163","DOIUrl":null,"url":null,"abstract":"<div><div>The neural radiance field (NERF) advocates learning the continuous representation of 3D geometry through a multilayer perceptron (MLP). By integrating this into a generative model, the generative neural radiance field (GRAF) is capable of producing images from random noise <span><math><mi>z</mi></math></span> without 3D supervision. In practice, the shape and appearance are modeled by <span><math><msub><mrow><mi>z</mi></mrow><mrow><mi>s</mi></mrow></msub></math></span> and <span><math><msub><mrow><mi>z</mi></mrow><mrow><mi>a</mi></mrow></msub></math></span>, respectively, to manipulate them separately during inference. However, it is challenging to represent multiple scenes using a solitary MLP and precisely control the generation of 3D geometry in terms of shape and appearance. In this paper, we introduce a controllable generative model (<span><math><mrow><mi>i</mi><mo>.</mo><mi>e</mi><mo>.</mo></mrow></math></span> <strong>CtrlNeRF</strong>) that uses a single MLP network to represent multiple scenes with shared weights. Consequently, we manipulated the shape and appearance codes to realize the controllable generation of high-fidelity images with 3D consistency. Moreover, the model enables the synthesis of novel views that do not exist in the training sets via camera pose alteration and feature interpolation. Extensive experiments were conducted to demonstrate its superiority in 3D-aware image generation compared to its counterparts.</div></div>","PeriodicalId":50628,"journal":{"name":"Computers & Graphics-Uk","volume":"126 ","pages":"Article 104163"},"PeriodicalIF":2.5000,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers & Graphics-Uk","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0097849325000020","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}

引用次数: 0

Abstract

The neural radiance field (NERF) advocates learning the continuous representation of 3D geometry through a multilayer perceptron (MLP). By integrating this into a generative model, the generative neural radiance field (GRAF) is capable of producing images from random noise

z

without 3D supervision. In practice, the shape and appearance are modeled by

z_{s}

and

z_{a}

, respectively, to manipulate them separately during inference. However, it is challenging to represent multiple scenes using a solitary MLP and precisely control the generation of 3D geometry in terms of shape and appearance. In this paper, we introduce a controllable generative model (

i . e .

CtrlNeRF) that uses a single MLP network to represent multiple scenes with shared weights. Consequently, we manipulated the shape and appearance codes to realize the controllable generation of high-fidelity images with 3D consistency. Moreover, the model enables the synthesis of novel views that do not exist in the training sets via camera pose alteration and feature interpolation. Extensive experiments were conducted to demonstrate its superiority in 3D-aware image generation compared to its counterparts.

Abstract Image

查看原文本刊更多论文

CtrlNeRF：用于高保真3d感知图像可控合成的生成神经辐射场

神经辐射场（NERF）提倡通过多层感知器（MLP）学习三维几何的连续表示。通过将其集成到生成模型中，生成神经辐射场（GRAF）能够在没有三维监督的情况下从随机噪声z生成图像。在实践中，形状和外观分别由zs和za建模，在推理过程中分别对它们进行操作。然而，使用一个单独的MLP来表示多个场景并在形状和外观方面精确控制3D几何的生成是具有挑战性的。在本文中，我们引入了一个可控生成模型（即CtrlNeRF），该模型使用单个MLP网络来表示具有共享权重的多个场景。因此，我们对形状和外观代码进行了操作，实现了具有三维一致性的高保真图像的可控生成。此外，该模型可以通过相机姿态改变和特征插值来合成训练集中不存在的新视图。进行了大量的实验来证明其在3d感知图像生成方面的优势。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Computers & Graphics-Uk 工程技术-计算机：软件工程

CiteScore

5.30

自引率

12.00%

发文量

173

审稿时长

38 days

期刊介绍： Computers & Graphics is dedicated to disseminate information on research and applications of computer graphics (CG) techniques. The journal encourages articles on: 1. Research and applications of interactive computer graphics. We are particularly interested in novel interaction techniques and applications of CG to problem domains. 2. State-of-the-art papers on late-breaking, cutting-edge research on CG. 3. Information on innovative uses of graphics principles and technologies. 4. Tutorial papers on both teaching CG principles and innovative uses of CG in education.