SAILOR：协同辐射场和占位场捕捉真人表演

ACM Transactions on Graphics (TOG) Pub Date : 2023-12-04 DOI:10.1145/3618370

Zheng Dong, Ke Xu, Yaoan Gao, Qilin Sun, Hujun Bao, Weiwei Xu, Rynson W. H. Lau

{"title":"SAILOR：协同辐射场和占位场捕捉真人表演","authors":"Zheng Dong, Ke Xu, Yaoan Gao, Qilin Sun, Hujun Bao, Weiwei Xu, Rynson W. H. Lau","doi":"10.1145/3618370","DOIUrl":null,"url":null,"abstract":"Immersive user experiences in live VR/AR performances require a fast and accurate free-view rendering of the performers. Existing methods are mainly based on Pixel-aligned Implicit Functions (PIFu) or Neural Radiance Fields (NeRF). However, while PIFu-based methods usually fail to produce photorealistic view-dependent textures, NeRF-based methods typically lack local geometry accuracy and are computationally heavy (e.g., dense sampling of 3D points, additional fine-tuning, or pose estimation). In this work, we propose a novel generalizable method, named SAILOR, to create high-quality human free-view videos from very sparse RGBD live streams. To produce view-dependent textures while preserving locally accurate geometry, we integrate PIFu and NeRF such that they work synergistically by conditioning the PIFu on depth and then rendering view-dependent textures through NeRF. Specifically, we propose a novel network, named SRONet, for this hybrid representation. SRONet can handle unseen performers without fine-tuning. Besides, a neural blending-based ray interpolation approach, a tree-based voxel-denoising scheme, and a parallel computing pipeline are incorporated to reconstruct and render live free-view videos at 10 fps on average. To evaluate the rendering performance, we construct a real-captured RGBD benchmark from 40 performers. Experimental results show that SAILOR outperforms existing human reconstruction and performance capture methods.","PeriodicalId":7077,"journal":{"name":"ACM Transactions on Graphics (TOG)","volume":"12 13","pages":"1 - 15"},"PeriodicalIF":0.0000,"publicationDate":"2023-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"SAILOR: Synergizing Radiance and Occupancy Fields for Live Human Performance Capture\",\"authors\":\"Zheng Dong, Ke Xu, Yaoan Gao, Qilin Sun, Hujun Bao, Weiwei Xu, Rynson W. H. Lau\",\"doi\":\"10.1145/3618370\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Immersive user experiences in live VR/AR performances require a fast and accurate free-view rendering of the performers. Existing methods are mainly based on Pixel-aligned Implicit Functions (PIFu) or Neural Radiance Fields (NeRF). However, while PIFu-based methods usually fail to produce photorealistic view-dependent textures, NeRF-based methods typically lack local geometry accuracy and are computationally heavy (e.g., dense sampling of 3D points, additional fine-tuning, or pose estimation). In this work, we propose a novel generalizable method, named SAILOR, to create high-quality human free-view videos from very sparse RGBD live streams. To produce view-dependent textures while preserving locally accurate geometry, we integrate PIFu and NeRF such that they work synergistically by conditioning the PIFu on depth and then rendering view-dependent textures through NeRF. Specifically, we propose a novel network, named SRONet, for this hybrid representation. SRONet can handle unseen performers without fine-tuning. Besides, a neural blending-based ray interpolation approach, a tree-based voxel-denoising scheme, and a parallel computing pipeline are incorporated to reconstruct and render live free-view videos at 10 fps on average. To evaluate the rendering performance, we construct a real-captured RGBD benchmark from 40 performers. Experimental results show that SAILOR outperforms existing human reconstruction and performance capture methods.\",\"PeriodicalId\":7077,\"journal\":{\"name\":\"ACM Transactions on Graphics (TOG)\",\"volume\":\"12 13\",\"pages\":\"1 - 15\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-12-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACM Transactions on Graphics (TOG)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3618370\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Graphics (TOG)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3618370","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

现场VR/AR表演的沉浸式用户体验需要表演者的快速准确的自由视图渲染。现有的方法主要基于像素对齐隐式函数(PIFu)或神经辐射场(NeRF)。然而，虽然基于pifu的方法通常无法产生逼真的视图依赖纹理，但基于nerf的方法通常缺乏局部几何精度并且计算量很大(例如，3D点的密集采样，额外的微调或姿态估计)。在这项工作中，我们提出了一种新的可推广方法，名为SAILOR，从非常稀疏的RGBD直播流中创建高质量的人类自由观看视频。为了产生依赖于视图的纹理，同时保持局部精确的几何形状，我们整合了PIFu和NeRF，使它们协同工作，通过调节PIFu的深度，然后通过NeRF渲染依赖于视图的纹理。具体来说，我们提出了一个新的网络，命名为SRONet，用于这种混合表示。SRONet无需微调就可以处理看不见的表演者。此外，结合基于神经混合的光线插值方法、基于树的体素去噪方案和并行计算管道，以平均10fps的速度重建和渲染实时自由视频。为了评估渲染性能，我们从40个表演者中构建了一个实时捕获的RGBD基准。实验结果表明，该方法优于现有的人体重建和性能捕获方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

SAILOR: Synergizing Radiance and Occupancy Fields for Live Human Performance Capture

Immersive user experiences in live VR/AR performances require a fast and accurate free-view rendering of the performers. Existing methods are mainly based on Pixel-aligned Implicit Functions (PIFu) or Neural Radiance Fields (NeRF). However, while PIFu-based methods usually fail to produce photorealistic view-dependent textures, NeRF-based methods typically lack local geometry accuracy and are computationally heavy (e.g., dense sampling of 3D points, additional fine-tuning, or pose estimation). In this work, we propose a novel generalizable method, named SAILOR, to create high-quality human free-view videos from very sparse RGBD live streams. To produce view-dependent textures while preserving locally accurate geometry, we integrate PIFu and NeRF such that they work synergistically by conditioning the PIFu on depth and then rendering view-dependent textures through NeRF. Specifically, we propose a novel network, named SRONet, for this hybrid representation. SRONet can handle unseen performers without fine-tuning. Besides, a neural blending-based ray interpolation approach, a tree-based voxel-denoising scheme, and a parallel computing pipeline are incorporated to reconstruct and render live free-view videos at 10 fps on average. To evaluate the rendering performance, we construct a real-captured RGBD benchmark from 40 performers. Experimental results show that SAILOR outperforms existing human reconstruction and performance capture methods.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

ACM Transactions on Graphics (TOG)

自引率

0.00%

发文量