SG-NeRF: Sparse-Input Generalized Neural Radiance Fields for Novel View Synthesis

IF 1.3 3区计算机科学 Q4 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

Journal of Computer Science and Technology Pub Date : 2024-06-26 DOI:10.1007/s11390-024-4157-6

Kuo Xu, Jie Li, Zhen-Qiang Li, Yang-Jie Cao

{"title":"SG-NeRF: Sparse-Input Generalized Neural Radiance Fields for Novel View Synthesis","authors":"Kuo Xu, Jie Li, Zhen-Qiang Li, Yang-Jie Cao","doi":"10.1007/s11390-024-4157-6","DOIUrl":null,"url":null,"abstract":"<p>Traditional neural radiance fields for rendering novel views require intensive input images and pre-scene optimization, which limits their practical applications. We propose a generalization method to infer scenes from input images and perform high-quality rendering without pre-scene optimization named SG-NeRF (Sparse-Input Generalized Neural Radiance Fields). Firstly, we construct an improved multi-view stereo structure based on the convolutional attention and multi-level fusion mechanism to obtain the geometric features and appearance features of the scene from the sparse input images, and then these features are aggregated by multi-head attention as the input of the neural radiance fields. This strategy of utilizing neural radiance fields to decode scene features instead of mapping positions and orientations enables our method to perform cross-scene training as well as inference, thus enabling neural radiance fields to generalize for novel view synthesis on unseen scenes. We tested the generalization ability on DTU dataset, and our PSNR (peak signal-to-noise ratio) improved by 3.14 compared with the baseline method under the same input conditions. In addition, if the scene has dense input views available, the average PSNR can be improved by 1.04 through further refinement training in a short time, and a higher quality rendering effect can be obtained.</p>","PeriodicalId":50222,"journal":{"name":"Journal of Computer Science and Technology","volume":"691 1","pages":""},"PeriodicalIF":1.3000,"publicationDate":"2024-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Computer Science and Technology","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s11390-024-4157-6","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}

引用次数: 0

Abstract

Traditional neural radiance fields for rendering novel views require intensive input images and pre-scene optimization, which limits their practical applications. We propose a generalization method to infer scenes from input images and perform high-quality rendering without pre-scene optimization named SG-NeRF (Sparse-Input Generalized Neural Radiance Fields). Firstly, we construct an improved multi-view stereo structure based on the convolutional attention and multi-level fusion mechanism to obtain the geometric features and appearance features of the scene from the sparse input images, and then these features are aggregated by multi-head attention as the input of the neural radiance fields. This strategy of utilizing neural radiance fields to decode scene features instead of mapping positions and orientations enables our method to perform cross-scene training as well as inference, thus enabling neural radiance fields to generalize for novel view synthesis on unseen scenes. We tested the generalization ability on DTU dataset, and our PSNR (peak signal-to-noise ratio) improved by 3.14 compared with the baseline method under the same input conditions. In addition, if the scene has dense input views available, the average PSNR can be improved by 1.04 through further refinement training in a short time, and a higher quality rendering effect can be obtained.

查看原文本刊更多论文

SG-NeRF：用于新颖视图合成的稀疏输入广义神经辐射场

传统的神经辐射场渲染新颖视图需要密集的输入图像和预场景优化，这限制了其实际应用。我们提出了一种从输入图像推断场景的广义方法，无需场景前优化即可实现高质量渲染，命名为 SG-NeRF（稀疏输入广义神经辐射场）。首先，我们基于卷积注意力和多级融合机制构建了一种改进的多视角立体结构，从稀疏输入图像中获取场景的几何特征和外观特征，然后通过多头注意力将这些特征聚合起来，作为神经辐射场的输入。这种利用神经辐射场解码场景特征而不是映射位置和方向的策略，使我们的方法能够进行跨场景训练和推理，从而使神经辐射场能够在未见过的场景中进行新颖视图合成的泛化。我们在 DTU 数据集上测试了泛化能力，在相同的输入条件下，我们的 PSNR（峰值信噪比）比基线方法提高了 3.14。此外，如果场景中有密集的输入视图，通过进一步细化训练，平均 PSNR 还能在短时间内提高 1.04，获得更高质量的渲染效果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Computer Science and Technology 工程技术-计算机：软件工程

CiteScore

4.00

自引率

0.00%

发文量

2255

审稿时长

9.8 months

期刊介绍： Journal of Computer Science and Technology (JCST), the first English language journal in the computer field published in China, is an international forum for scientists and engineers involved in all aspects of computer science and technology to publish high quality and refereed papers. Papers reporting original research and innovative applications from all parts of the world are welcome. Papers for publication in the journal are selected through rigorous peer review, to ensure originality, timeliness, relevance, and readability. While the journal emphasizes the publication of previously unpublished materials, selected conference papers with exceptional merit that require wider exposure are, at the discretion of the editors, also published, provided they meet the journal''s peer review standards. The journal also seeks clearly written survey and review articles from experts in the field, to promote insightful understanding of the state-of-the-art and technology trends. Topics covered by Journal of Computer Science and Technology include but are not limited to: -Computer Architecture and Systems -Artificial Intelligence and Pattern Recognition -Computer Networks and Distributed Computing -Computer Graphics and Multimedia -Software Systems -Data Management and Data Mining -Theory and Algorithms -Emerging Areas