CtrlNeRF: The generative neural radiation fields for the controllable synthesis of high-fidelity 3D-aware images

IF 2.5 4区 计算机科学 Q2 COMPUTER SCIENCE, SOFTWARE ENGINEERING
Jian Liu , Zhen Yu
{"title":"CtrlNeRF: The generative neural radiation fields for the controllable synthesis of high-fidelity 3D-aware images","authors":"Jian Liu ,&nbsp;Zhen Yu","doi":"10.1016/j.cag.2025.104163","DOIUrl":null,"url":null,"abstract":"<div><div>The neural radiance field (NERF) advocates learning the continuous representation of 3D geometry through a multilayer perceptron (MLP). By integrating this into a generative model, the generative neural radiance field (GRAF) is capable of producing images from random noise <span><math><mi>z</mi></math></span> without 3D supervision. In practice, the shape and appearance are modeled by <span><math><msub><mrow><mi>z</mi></mrow><mrow><mi>s</mi></mrow></msub></math></span> and <span><math><msub><mrow><mi>z</mi></mrow><mrow><mi>a</mi></mrow></msub></math></span>, respectively, to manipulate them separately during inference. However, it is challenging to represent multiple scenes using a solitary MLP and precisely control the generation of 3D geometry in terms of shape and appearance. In this paper, we introduce a controllable generative model (<span><math><mrow><mi>i</mi><mo>.</mo><mi>e</mi><mo>.</mo></mrow></math></span> <strong>CtrlNeRF</strong>) that uses a single MLP network to represent multiple scenes with shared weights. Consequently, we manipulated the shape and appearance codes to realize the controllable generation of high-fidelity images with 3D consistency. Moreover, the model enables the synthesis of novel views that do not exist in the training sets via camera pose alteration and feature interpolation. Extensive experiments were conducted to demonstrate its superiority in 3D-aware image generation compared to its counterparts.</div></div>","PeriodicalId":50628,"journal":{"name":"Computers & Graphics-Uk","volume":"126 ","pages":"Article 104163"},"PeriodicalIF":2.5000,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers & Graphics-Uk","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0097849325000020","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}
引用次数: 0

Abstract

The neural radiance field (NERF) advocates learning the continuous representation of 3D geometry through a multilayer perceptron (MLP). By integrating this into a generative model, the generative neural radiance field (GRAF) is capable of producing images from random noise z without 3D supervision. In practice, the shape and appearance are modeled by zs and za, respectively, to manipulate them separately during inference. However, it is challenging to represent multiple scenes using a solitary MLP and precisely control the generation of 3D geometry in terms of shape and appearance. In this paper, we introduce a controllable generative model (i.e. CtrlNeRF) that uses a single MLP network to represent multiple scenes with shared weights. Consequently, we manipulated the shape and appearance codes to realize the controllable generation of high-fidelity images with 3D consistency. Moreover, the model enables the synthesis of novel views that do not exist in the training sets via camera pose alteration and feature interpolation. Extensive experiments were conducted to demonstrate its superiority in 3D-aware image generation compared to its counterparts.

Abstract Image

CtrlNeRF:用于高保真3d感知图像可控合成的生成神经辐射场
神经辐射场(NERF)提倡通过多层感知器(MLP)学习三维几何的连续表示。通过将其集成到生成模型中,生成神经辐射场(GRAF)能够在没有三维监督的情况下从随机噪声z生成图像。在实践中,形状和外观分别由zs和za建模,在推理过程中分别对它们进行操作。然而,使用一个单独的MLP来表示多个场景并在形状和外观方面精确控制3D几何的生成是具有挑战性的。在本文中,我们引入了一个可控生成模型(即CtrlNeRF),该模型使用单个MLP网络来表示具有共享权重的多个场景。因此,我们对形状和外观代码进行了操作,实现了具有三维一致性的高保真图像的可控生成。此外,该模型可以通过相机姿态改变和特征插值来合成训练集中不存在的新视图。进行了大量的实验来证明其在3d感知图像生成方面的优势。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Computers & Graphics-Uk
Computers & Graphics-Uk 工程技术-计算机:软件工程
CiteScore
5.30
自引率
12.00%
发文量
173
审稿时长
38 days
期刊介绍: Computers & Graphics is dedicated to disseminate information on research and applications of computer graphics (CG) techniques. The journal encourages articles on: 1. Research and applications of interactive computer graphics. We are particularly interested in novel interaction techniques and applications of CG to problem domains. 2. State-of-the-art papers on late-breaking, cutting-edge research on CG. 3. Information on innovative uses of graphics principles and technologies. 4. Tutorial papers on both teaching CG principles and innovative uses of CG in education.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信