V4D:用于4D新视图合成的体素

IF 6.5 1区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

IEEE Transactions on Visualization and Computer Graphics Pub Date : 2022-05-28 DOI:10.48550/arXiv.2205.14332

Wanshui Gan, Hongbin Xu, Yi Huang, Shifeng Chen, N. Yokoya

{"title":"V4D:用于4D新视图合成的体素","authors":"Wanshui Gan, Hongbin Xu, Yi Huang, Shifeng Chen, N. Yokoya","doi":"10.48550/arXiv.2205.14332","DOIUrl":null,"url":null,"abstract":"Neural radiance fields have made a remarkable breakthrough in the novel view synthesis task at the 3D static scene. However, for the 4D circumstance (e.g., dynamic scene), the performance of the existing method is still limited by the capacity of the neural network, typically in a multilayer perceptron network (MLP). In this paper, we utilize 3D Voxel to model the 4D neural radiance field, short as V4D, where the 3D voxel has two formats. The first one is to regularly model the 3D space and then use the sampled local 3D feature with the time index to model the density field and the texture field by a tiny MLP. The second one is in look-up tables (LUTs) format that is for the pixel-level refinement, where the pseudo-surface produced by the volume rendering is utilized as the guidance information to learn a 2D pixel-level refinement mapping. The proposed LUTs-based refinement module achieves the performance gain with little computational cost and could serve as the plug-and-play module in the novel view synthesis task. Moreover, we propose a more effective conditional positional encoding toward the 4D data that achieves performance gain with negligible computational burdens. Extensive experiments demonstrate that the proposed method achieves state-of-the-art performance at a low computational cost. The relevant code is available in https://github.com/GANWANSHUI/V4D.","PeriodicalId":13376,"journal":{"name":"IEEE Transactions on Visualization and Computer Graphics","volume":" ","pages":""},"PeriodicalIF":6.5000,"publicationDate":"2022-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"18","resultStr":"{\"title\":\"V4D: Voxel for 4D Novel View Synthesis\",\"authors\":\"Wanshui Gan, Hongbin Xu, Yi Huang, Shifeng Chen, N. Yokoya\",\"doi\":\"10.48550/arXiv.2205.14332\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Neural radiance fields have made a remarkable breakthrough in the novel view synthesis task at the 3D static scene. However, for the 4D circumstance (e.g., dynamic scene), the performance of the existing method is still limited by the capacity of the neural network, typically in a multilayer perceptron network (MLP). In this paper, we utilize 3D Voxel to model the 4D neural radiance field, short as V4D, where the 3D voxel has two formats. The first one is to regularly model the 3D space and then use the sampled local 3D feature with the time index to model the density field and the texture field by a tiny MLP. The second one is in look-up tables (LUTs) format that is for the pixel-level refinement, where the pseudo-surface produced by the volume rendering is utilized as the guidance information to learn a 2D pixel-level refinement mapping. The proposed LUTs-based refinement module achieves the performance gain with little computational cost and could serve as the plug-and-play module in the novel view synthesis task. Moreover, we propose a more effective conditional positional encoding toward the 4D data that achieves performance gain with negligible computational burdens. Extensive experiments demonstrate that the proposed method achieves state-of-the-art performance at a low computational cost. The relevant code is available in https://github.com/GANWANSHUI/V4D.\",\"PeriodicalId\":13376,\"journal\":{\"name\":\"IEEE Transactions on Visualization and Computer Graphics\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":6.5000,\"publicationDate\":\"2022-05-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"18\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Visualization and Computer Graphics\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.48550/arXiv.2205.14332\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, SOFTWARE ENGINEERING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Visualization and Computer Graphics","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.48550/arXiv.2205.14332","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}

引用次数: 18

摘要

神经辐射场在三维静态场景的新型视图合成任务中取得了显著的突破。然而，对于4D环境(如动态场景)，现有方法的性能仍然受到神经网络容量的限制，特别是在多层感知器网络(MLP)中。在本文中，我们利用3D体素来建模4D神经辐射场，简称V4D，其中3D体素有两种格式。第一种方法是对三维空间进行规则建模，然后利用采样的局部三维特征和时间索引，通过一个微小的MLP对密度场和纹理场进行建模。第二种是用于像素级细化的查找表(LUTs)格式，其中利用体渲染产生的伪表面作为指导信息来学习2D像素级细化映射。提出的基于lut的优化模块以较小的计算成本实现了性能提升，可以作为新型视图合成任务的即插即用模块。此外，我们提出了一种更有效的4D数据条件位置编码，在计算负担可以忽略不计的情况下实现性能提升。大量的实验表明，该方法以较低的计算成本达到了最先进的性能。相关代码可在https://github.com/GANWANSHUI/V4D中获得。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

V4D: Voxel for 4D Novel View Synthesis

Neural radiance fields have made a remarkable breakthrough in the novel view synthesis task at the 3D static scene. However, for the 4D circumstance (e.g., dynamic scene), the performance of the existing method is still limited by the capacity of the neural network, typically in a multilayer perceptron network (MLP). In this paper, we utilize 3D Voxel to model the 4D neural radiance field, short as V4D, where the 3D voxel has two formats. The first one is to regularly model the 3D space and then use the sampled local 3D feature with the time index to model the density field and the texture field by a tiny MLP. The second one is in look-up tables (LUTs) format that is for the pixel-level refinement, where the pseudo-surface produced by the volume rendering is utilized as the guidance information to learn a 2D pixel-level refinement mapping. The proposed LUTs-based refinement module achieves the performance gain with little computational cost and could serve as the plug-and-play module in the novel view synthesis task. Moreover, we propose a more effective conditional positional encoding toward the 4D data that achieves performance gain with negligible computational burdens. Extensive experiments demonstrate that the proposed method achieves state-of-the-art performance at a low computational cost. The relevant code is available in https://github.com/GANWANSHUI/V4D.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Transactions on Visualization and Computer Graphics 工程技术-计算机：软件工程

CiteScore

10.40

自引率

19.20%

发文量

946

审稿时长

4.5 months

期刊介绍： TVCG is a scholarly, archival journal published monthly. Its Editorial Board strives to publish papers that present important research results and state-of-the-art seminal papers in computer graphics, visualization, and virtual reality. Specific topics include, but are not limited to: rendering technologies; geometric modeling and processing; shape analysis; graphics hardware; animation and simulation; perception, interaction and user interfaces; haptics; computational photography; high-dynamic range imaging and display; user studies and evaluation; biomedical visualization; volume visualization and graphics; visual analytics for machine learning; topology-based visualization; visual programming and software visualization; visualization in data science; virtual reality, augmented reality and mixed reality; advanced display technology, (e.g., 3D, immersive and multi-modal displays); applications of computer graphics and visualization.