DA4NeRF: Depth-aware Augmentation technique for Neural Radiance Fields

IF 2.6 4区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Journal of Visual Communication and Image Representation Pub Date : 2024-12-09 DOI:10.1016/j.jvcir.2024.104365

Hamed Razavi Khosroshahi , Jaime Sancho , Gun Bang , Gauthier Lafruit , Eduardo Juarez , Mehrdad Teratani

{"title":"DA4NeRF: Depth-aware Augmentation technique for Neural Radiance Fields","authors":"Hamed Razavi Khosroshahi , Jaime Sancho , Gun Bang , Gauthier Lafruit , Eduardo Juarez , Mehrdad Teratani","doi":"10.1016/j.jvcir.2024.104365","DOIUrl":null,"url":null,"abstract":"<div><div>Neural Radiance Fields (NeRF) demonstrate impressive capabilities in rendering novel views of specific scenes by learning an implicit volumetric representation from posed RGB images without any depth information. View synthesis is the computational process of synthesizing novel images of a scene from different viewpoints, based on a set of existing images. One big problem is the need for a large number of images in the training datasets for neural network-based view synthesis frameworks. The challenge of data augmentation for view synthesis applications has not been addressed yet. NeRF models require comprehensive scene coverage in multiple views to accurately estimate radiance and density at any point. In cases without sufficient coverage of scenes with different viewing directions, cannot effectively interpolate or extrapolate unseen scene parts. In this paper, we introduce a new pipeline to tackle this data augmentation problem using depth data. We use MPEG’s Depth Estimation Reference Software and Reference View Synthesizer to add novel non-existent views to the training sets needed for the NeRF framework. Experimental results show that our approach improves the quality of the rendered images using NeRF’s model. The average quality increased by 6.4 dB in terms of Peak Signal-to-Noise Ratio (PSNR), with the highest increase being 11 dB. Our approach not only adds the ability to handle the sparsely captured multiview content to be used in the NeRF framework, but also makes NeRF more accurate and useful for creating high-quality virtual views.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"107 ","pages":"Article 104365"},"PeriodicalIF":2.6000,"publicationDate":"2024-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Visual Communication and Image Representation","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1047320324003213","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Neural Radiance Fields (NeRF) demonstrate impressive capabilities in rendering novel views of specific scenes by learning an implicit volumetric representation from posed RGB images without any depth information. View synthesis is the computational process of synthesizing novel images of a scene from different viewpoints, based on a set of existing images. One big problem is the need for a large number of images in the training datasets for neural network-based view synthesis frameworks. The challenge of data augmentation for view synthesis applications has not been addressed yet. NeRF models require comprehensive scene coverage in multiple views to accurately estimate radiance and density at any point. In cases without sufficient coverage of scenes with different viewing directions, cannot effectively interpolate or extrapolate unseen scene parts. In this paper, we introduce a new pipeline to tackle this data augmentation problem using depth data. We use MPEG’s Depth Estimation Reference Software and Reference View Synthesizer to add novel non-existent views to the training sets needed for the NeRF framework. Experimental results show that our approach improves the quality of the rendered images using NeRF’s model. The average quality increased by 6.4 dB in terms of Peak Signal-to-Noise Ratio (PSNR), with the highest increase being 11 dB. Our approach not only adds the ability to handle the sparsely captured multiview content to be used in the NeRF framework, but also makes NeRF more accurate and useful for creating high-quality virtual views.

查看原文本刊更多论文

DA4NeRF：神经辐射场的深度感知增强技术

神经辐射场（NeRF）通过在没有任何深度信息的情况下从RGB图像中学习隐式体积表示，在渲染特定场景的新视图方面展示了令人印象深刻的能力。视图合成是在一组现有图像的基础上，从不同的视点合成一个场景的新图像的计算过程。一个大问题是基于神经网络的视图合成框架需要大量的训练数据集中的图像。视图合成应用中数据增强的挑战还没有得到解决。NeRF模型需要在多个视图中全面覆盖场景，以准确估计任何点的亮度和密度。在不同观看方向的场景没有足够覆盖的情况下，无法有效地插值或外推未见的场景部分。在本文中，我们引入了一种新的管道来解决使用深度数据的数据增强问题。我们使用MPEG的深度估计参考软件和参考视图合成器将新的不存在的视图添加到NeRF框架所需的训练集中。实验结果表明，我们的方法提高了NeRF模型渲染图像的质量。峰值信噪比（PSNR）平均质量提高了6.4 dB，最高提高了11 dB。我们的方法不仅增加了处理在NeRF框架中使用的稀疏捕获的多视图内容的能力，而且还使NeRF在创建高质量的虚拟视图时更加准确和有用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Visual Communication and Image Representation 工程技术-计算机：软件工程

CiteScore

5.40

自引率

11.50%

发文量

188

审稿时长

9.9 months

期刊介绍： The Journal of Visual Communication and Image Representation publishes papers on state-of-the-art visual communication and image representation, with emphasis on novel technologies and theoretical work in this multidisciplinary area of pure and applied research. The field of visual communication and image representation is considered in its broadest sense and covers both digital and analog aspects as well as processing and communication in biological visual systems.