基于神经辐射场的实时面部重建与表情替换

IF 3.6

Systems and Soft Computing Pub Date : 2025-01-04 DOI:10.1016/j.sasc.2025.200185

Shenning Zhang , Hui Li , Xuefeng Tian

{"title":"基于神经辐射场的实时面部重建与表情替换","authors":"Shenning Zhang , Hui Li , Xuefeng Tian","doi":"10.1016/j.sasc.2025.200185","DOIUrl":null,"url":null,"abstract":"<div><div>It is now possible to do high-fidelity 3D facial reconstruction and unique view synthesis thanks to the recent discovery of Neural Radiance Fields (NeRF), which has established its substantial importance in the field of 3D vision. However, the operational approaches that are now in use require a significant amount of human engagement, such as the need for users to provide semantic masks and the inconvenience of manual attribute searching for non-expert users. Our approach focuses on enabling the manipulation of NeRF-reconstructed faces with just a single text input. A scene manipulator, specifically a conditional version NeRF with deformable latent codes, is the first thing that this paper trains to accomplish this objective, in dynamic scenes, allowing facial deformations to be controlled through latent codes. However, to synthesize local deformations in a variety of contexts, it is not desirable to describe scene deformations using only a single latent coding. Therefore, this paper proposes a text-driven operation pipeline for facial reconstruction with NeRF, the development of an operating network that is capable of learning to represent scene changes using latent codes that vary at different spatial locations, and the integration of a WeChat mini-program to facilitate practical applications. This application approach enables even non-expert users to easily synthesize novel views. Our method has achieved a certain breakthrough in the field of 3D facial reconstruction, providing users with a simple and convenient text-driven operation approach.</div></div>","PeriodicalId":101205,"journal":{"name":"Systems and Soft Computing","volume":"7 ","pages":"Article 200185"},"PeriodicalIF":3.6000,"publicationDate":"2025-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Real-time facial reconstruction and expression replacement based on neural radiation field\",\"authors\":\"Shenning Zhang , Hui Li , Xuefeng Tian\",\"doi\":\"10.1016/j.sasc.2025.200185\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>It is now possible to do high-fidelity 3D facial reconstruction and unique view synthesis thanks to the recent discovery of Neural Radiance Fields (NeRF), which has established its substantial importance in the field of 3D vision. However, the operational approaches that are now in use require a significant amount of human engagement, such as the need for users to provide semantic masks and the inconvenience of manual attribute searching for non-expert users. Our approach focuses on enabling the manipulation of NeRF-reconstructed faces with just a single text input. A scene manipulator, specifically a conditional version NeRF with deformable latent codes, is the first thing that this paper trains to accomplish this objective, in dynamic scenes, allowing facial deformations to be controlled through latent codes. However, to synthesize local deformations in a variety of contexts, it is not desirable to describe scene deformations using only a single latent coding. Therefore, this paper proposes a text-driven operation pipeline for facial reconstruction with NeRF, the development of an operating network that is capable of learning to represent scene changes using latent codes that vary at different spatial locations, and the integration of a WeChat mini-program to facilitate practical applications. This application approach enables even non-expert users to easily synthesize novel views. Our method has achieved a certain breakthrough in the field of 3D facial reconstruction, providing users with a simple and convenient text-driven operation approach.</div></div>\",\"PeriodicalId\":101205,\"journal\":{\"name\":\"Systems and Soft Computing\",\"volume\":\"7 \",\"pages\":\"Article 200185\"},\"PeriodicalIF\":3.6000,\"publicationDate\":\"2025-01-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Systems and Soft Computing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2772941925000031\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Systems and Soft Computing","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2772941925000031","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

由于最近发现了神经辐射场（NeRF），现在可以进行高保真的3D面部重建和独特的视图合成，这在3D视觉领域已经确立了其实质性的重要性。然而，目前使用的操作方法需要大量的人工参与，例如需要用户提供语义掩码，以及对非专业用户进行手动属性搜索的不便。我们的方法侧重于仅通过单个文本输入就能操纵nerf重建的人脸。场景操纵器，特别是具有可变形潜在代码的条件版本NeRF，是本文为实现这一目标而训练的第一个东西，在动态场景中，允许通过潜在代码控制面部变形。然而，为了在各种环境中合成局部变形，仅使用单一的潜在编码来描述场景变形是不可取的。因此，本文提出了一种基于NeRF的文本驱动的人脸重建操作管道，开发一种能够使用不同空间位置的潜在代码学习表示场景变化的操作网络，并集成微信小程序以方便实际应用。这种应用程序方法使非专业用户也能轻松地合成新的视图。我们的方法在三维人脸重建领域取得了一定的突破，为用户提供了一种简单方便的文本驱动操作方式。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Real-time facial reconstruction and expression replacement based on neural radiation field

It is now possible to do high-fidelity 3D facial reconstruction and unique view synthesis thanks to the recent discovery of Neural Radiance Fields (NeRF), which has established its substantial importance in the field of 3D vision. However, the operational approaches that are now in use require a significant amount of human engagement, such as the need for users to provide semantic masks and the inconvenience of manual attribute searching for non-expert users. Our approach focuses on enabling the manipulation of NeRF-reconstructed faces with just a single text input. A scene manipulator, specifically a conditional version NeRF with deformable latent codes, is the first thing that this paper trains to accomplish this objective, in dynamic scenes, allowing facial deformations to be controlled through latent codes. However, to synthesize local deformations in a variety of contexts, it is not desirable to describe scene deformations using only a single latent coding. Therefore, this paper proposes a text-driven operation pipeline for facial reconstruction with NeRF, the development of an operating network that is capable of learning to represent scene changes using latent codes that vary at different spatial locations, and the integration of a WeChat mini-program to facilitate practical applications. This application approach enables even non-expert users to easily synthesize novel views. Our method has achieved a certain breakthrough in the field of 3D facial reconstruction, providing users with a simple and convenient text-driven operation approach.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Systems and Soft Computing

CiteScore

2.20

自引率

0.00%

发文量