学习表达性神经头像的潜在表达代码

ACM SIGGRAPH 2023 Conference Proceedings Pub Date : 2023-05-02 DOI:10.1145/3588432.3591545

Yuelang Xu, Hongwen Zhang, Lizhen Wang, Xiaochen Zhao, Han Huang, Guojun Qi, Yebin Liu

{"title":"学习表达性神经头像的潜在表达代码","authors":"Yuelang Xu, Hongwen Zhang, Lizhen Wang, Xiaochen Zhao, Han Huang, Guojun Qi, Yebin Liu","doi":"10.1145/3588432.3591545","DOIUrl":null,"url":null,"abstract":"Existing approaches to animatable NeRF-based head avatars are either built upon face templates or use the expression coefficients of templates as the driving signal. Despite the promising progress, their performances are heavily bound by the expression power and the tracking accuracy of the templates. In this work, we present LatentAvatar, an expressive neural head avatar driven by latent expression codes. Such latent expression codes are learned in an end-to-end and self-supervised manner without templates, enabling our method to get rid of expression and tracking issues. To achieve this, we leverage a latent head NeRF to learn the person-specific latent expression codes from a monocular portrait video, and further design a Y-shaped network to learn the shared latent expression codes of different subjects for cross-identity reenactment. By optimizing the photometric reconstruction objectives in NeRF, the latent expression codes are learned to be 3D-aware while faithfully capturing the high-frequency detailed expressions. Moreover, by learning a mapping between the latent expression code learned in shared and person-specific settings, LatentAvatar is able to perform expressive reenactment between different subjects. Experimental results show that our LatentAvatar is able to capture challenging expressions and the subtle movement of teeth and even eyeballs, which outperforms previous state-of-the-art solutions in both quantitative and qualitative comparisons. Project page: https://www.liuyebin.com/latentavatar.","PeriodicalId":280036,"journal":{"name":"ACM SIGGRAPH 2023 Conference Proceedings","volume":"82 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"14","resultStr":"{\"title\":\"LatentAvatar: Learning Latent Expression Code for Expressive Neural Head Avatar\",\"authors\":\"Yuelang Xu, Hongwen Zhang, Lizhen Wang, Xiaochen Zhao, Han Huang, Guojun Qi, Yebin Liu\",\"doi\":\"10.1145/3588432.3591545\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Existing approaches to animatable NeRF-based head avatars are either built upon face templates or use the expression coefficients of templates as the driving signal. Despite the promising progress, their performances are heavily bound by the expression power and the tracking accuracy of the templates. In this work, we present LatentAvatar, an expressive neural head avatar driven by latent expression codes. Such latent expression codes are learned in an end-to-end and self-supervised manner without templates, enabling our method to get rid of expression and tracking issues. To achieve this, we leverage a latent head NeRF to learn the person-specific latent expression codes from a monocular portrait video, and further design a Y-shaped network to learn the shared latent expression codes of different subjects for cross-identity reenactment. By optimizing the photometric reconstruction objectives in NeRF, the latent expression codes are learned to be 3D-aware while faithfully capturing the high-frequency detailed expressions. Moreover, by learning a mapping between the latent expression code learned in shared and person-specific settings, LatentAvatar is able to perform expressive reenactment between different subjects. Experimental results show that our LatentAvatar is able to capture challenging expressions and the subtle movement of teeth and even eyeballs, which outperforms previous state-of-the-art solutions in both quantitative and qualitative comparisons. Project page: https://www.liuyebin.com/latentavatar.\",\"PeriodicalId\":280036,\"journal\":{\"name\":\"ACM SIGGRAPH 2023 Conference Proceedings\",\"volume\":\"82 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-05-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"14\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACM SIGGRAPH 2023 Conference Proceedings\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3588432.3591545\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM SIGGRAPH 2023 Conference Proceedings","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3588432.3591545","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 14

摘要

现有的基于nerf的可动画头部头像的方法要么建立在面部模板上，要么使用模板的表情系数作为驱动信号。尽管取得了很好的进展，但它们的性能在很大程度上受到模板的表达能力和跟踪精度的限制。在这项工作中，我们提出了LatentAvatar，一个由潜在表达代码驱动的表达性神经头像。这种潜在的表达代码以端到端和自监督的方式学习，没有模板，使我们的方法摆脱了表达和跟踪问题。为了实现这一目标，我们利用潜在头部NeRF从单目人像视频中学习个体特异性潜在表达代码，并进一步设计一个y形网络来学习不同被试的共享潜在表达代码，用于跨身份再现。通过优化NeRF中的光度重建目标，学习潜在表达代码在忠实捕获高频细节表达的同时具有3d感知能力。此外，通过学习在共享和个人特定设置中学习的潜在表达代码之间的映射，LatentAvatar能够在不同的主题之间执行表达再现。实验结果表明，我们的LatentAvatar能够捕捉具有挑战性的表情和牙齿甚至眼球的细微运动，在定量和定性比较中都优于之前最先进的解决方案。项目页面:https://www.liuyebin.com/latentavatar。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

LatentAvatar: Learning Latent Expression Code for Expressive Neural Head Avatar

Existing approaches to animatable NeRF-based head avatars are either built upon face templates or use the expression coefficients of templates as the driving signal. Despite the promising progress, their performances are heavily bound by the expression power and the tracking accuracy of the templates. In this work, we present LatentAvatar, an expressive neural head avatar driven by latent expression codes. Such latent expression codes are learned in an end-to-end and self-supervised manner without templates, enabling our method to get rid of expression and tracking issues. To achieve this, we leverage a latent head NeRF to learn the person-specific latent expression codes from a monocular portrait video, and further design a Y-shaped network to learn the shared latent expression codes of different subjects for cross-identity reenactment. By optimizing the photometric reconstruction objectives in NeRF, the latent expression codes are learned to be 3D-aware while faithfully capturing the high-frequency detailed expressions. Moreover, by learning a mapping between the latent expression code learned in shared and person-specific settings, LatentAvatar is able to perform expressive reenactment between different subjects. Experimental results show that our LatentAvatar is able to capture challenging expressions and the subtle movement of teeth and even eyeballs, which outperforms previous state-of-the-art solutions in both quantitative and qualitative comparisons. Project page: https://www.liuyebin.com/latentavatar.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

ACM SIGGRAPH 2023 Conference Proceedings

自引率

0.00%

发文量