4D 面部表情扩散模型

IF 6 3区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

ACM Transactions on Multimedia Computing Communications and Applications Pub Date : 2024-03-28 DOI:10.1145/3653455

Kaifeng Zou, Sylvain Faisan, Boyang Yu, Sébastien Valette, Hyewon Seo

{"title":"4D 面部表情扩散模型","authors":"Kaifeng Zou, Sylvain Faisan, Boyang Yu, Sébastien Valette, Hyewon Seo","doi":"10.1145/3653455","DOIUrl":null,"url":null,"abstract":"<p>Facial expression generation is one of the most challenging and long-sought aspects of character animation, with many interesting applications. The challenging task, traditionally having relied heavily on digital craftspersons, remains yet to be explored. In this paper, we introduce a generative framework for generating 3D facial expression sequences (i.e. 4D faces) that can be conditioned on different inputs to animate an arbitrary 3D face mesh. It is composed of two tasks: (1) Learning the generative model that is trained over a set of 3D landmark sequences, and (2) Generating 3D mesh sequences of an input facial mesh driven by the generated landmark sequences. The generative model is based on a Denoising Diffusion Probabilistic Model (DDPM), which has achieved remarkable success in generative tasks of other domains. While it can be trained unconditionally, its reverse process can still be conditioned by various condition signals. This allows us to efficiently develop several downstream tasks involving various conditional generation, by using expression labels, text, partial sequences, or simply a facial geometry. To obtain the full mesh deformation, we then develop a landmark-guided encoder-decoder to apply the geometrical deformation embedded in landmarks on a given facial mesh. Experiments show that our model has learned to generate realistic, quality expressions solely from the dataset of relatively small size, improving over the state-of-the-art methods. Videos and qualitative comparisons with other methods can be found at https://github.com/ZOUKaifeng/4DFM. Code and models will be made available upon acceptance.</p>","PeriodicalId":50937,"journal":{"name":"ACM Transactions on Multimedia Computing Communications and Applications","volume":"53 1","pages":""},"PeriodicalIF":6.0000,"publicationDate":"2024-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"4D Facial Expression Diffusion Model\",\"authors\":\"Kaifeng Zou, Sylvain Faisan, Boyang Yu, Sébastien Valette, Hyewon Seo\",\"doi\":\"10.1145/3653455\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Facial expression generation is one of the most challenging and long-sought aspects of character animation, with many interesting applications. The challenging task, traditionally having relied heavily on digital craftspersons, remains yet to be explored. In this paper, we introduce a generative framework for generating 3D facial expression sequences (i.e. 4D faces) that can be conditioned on different inputs to animate an arbitrary 3D face mesh. It is composed of two tasks: (1) Learning the generative model that is trained over a set of 3D landmark sequences, and (2) Generating 3D mesh sequences of an input facial mesh driven by the generated landmark sequences. The generative model is based on a Denoising Diffusion Probabilistic Model (DDPM), which has achieved remarkable success in generative tasks of other domains. While it can be trained unconditionally, its reverse process can still be conditioned by various condition signals. This allows us to efficiently develop several downstream tasks involving various conditional generation, by using expression labels, text, partial sequences, or simply a facial geometry. To obtain the full mesh deformation, we then develop a landmark-guided encoder-decoder to apply the geometrical deformation embedded in landmarks on a given facial mesh. Experiments show that our model has learned to generate realistic, quality expressions solely from the dataset of relatively small size, improving over the state-of-the-art methods. Videos and qualitative comparisons with other methods can be found at https://github.com/ZOUKaifeng/4DFM. Code and models will be made available upon acceptance.</p>\",\"PeriodicalId\":50937,\"journal\":{\"name\":\"ACM Transactions on Multimedia Computing Communications and Applications\",\"volume\":\"53 1\",\"pages\":\"\"},\"PeriodicalIF\":6.0000,\"publicationDate\":\"2024-03-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACM Transactions on Multimedia Computing Communications and Applications\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1145/3653455\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Multimedia Computing Communications and Applications","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1145/3653455","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

摘要

面部表情生成是角色动画中最具挑战性和长期追求的方面之一，有许多有趣的应用。这项极具挑战性的任务在很大程度上一直依赖于数字工艺师，目前仍有待探索。在本文中，我们介绍了一种生成三维面部表情序列（即 4D 面部）的生成框架，该框架可根据不同的输入条件生成任意三维面部网格的动画。它由两项任务组成：(1) 通过一组三维地标序列学习训练生成模型；(2) 根据生成的地标序列生成输入面部网格的三维网格序列。生成模型基于去噪扩散概率模型（DDPM），该模型在其他领域的生成任务中取得了显著的成功。虽然它可以无条件地进行训练，但其反向过程仍然可以受到各种条件信号的制约。这样，我们就可以通过使用表情标签、文本、部分序列或简单的面部几何图形，高效地开发涉及各种条件生成的下游任务。为了获得完整的网格变形，我们开发了一种地标引导的编码器-解码器，用于在给定的面部网格上应用嵌入地标的几何变形。实验表明，我们的模型已学会仅从相对较小的数据集生成逼真、高质量的表情，比最先进的方法更胜一筹。视频和与其他方法的定性比较见 https://github.com/ZOUKaifeng/4DFM。代码和模型将在接受后提供。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

4D Facial Expression Diffusion Model

Facial expression generation is one of the most challenging and long-sought aspects of character animation, with many interesting applications. The challenging task, traditionally having relied heavily on digital craftspersons, remains yet to be explored. In this paper, we introduce a generative framework for generating 3D facial expression sequences (i.e. 4D faces) that can be conditioned on different inputs to animate an arbitrary 3D face mesh. It is composed of two tasks: (1) Learning the generative model that is trained over a set of 3D landmark sequences, and (2) Generating 3D mesh sequences of an input facial mesh driven by the generated landmark sequences. The generative model is based on a Denoising Diffusion Probabilistic Model (DDPM), which has achieved remarkable success in generative tasks of other domains. While it can be trained unconditionally, its reverse process can still be conditioned by various condition signals. This allows us to efficiently develop several downstream tasks involving various conditional generation, by using expression labels, text, partial sequences, or simply a facial geometry. To obtain the full mesh deformation, we then develop a landmark-guided encoder-decoder to apply the geometrical deformation embedded in landmarks on a given facial mesh. Experiments show that our model has learned to generate realistic, quality expressions solely from the dataset of relatively small size, improving over the state-of-the-art methods. Videos and qualitative comparisons with other methods can be found at https://github.com/ZOUKaifeng/4DFM. Code and models will be made available upon acceptance.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

ACM Transactions on Multimedia Computing Communications and Applications 工程技术-计算机：理论方法

CiteScore

8.50

自引率

5.90%

发文量

285

审稿时长

7.5 months

期刊介绍： The ACM Transactions on Multimedia Computing, Communications, and Applications is the flagship publication of the ACM Special Interest Group in Multimedia (SIGMM). It is soliciting paper submissions on all aspects of multimedia. Papers on single media (for instance, audio, video, animation) and their processing are also welcome. TOMM is a peer-reviewed, archival journal, available in both print form and digital form. The Journal is published quarterly; with roughly 7 23-page articles in each issue. In addition, all Special Issues are published online-only to ensure a timely publication. The transactions consists primarily of research papers. This is an archival journal and it is intended that the papers will have lasting importance and value over time. In general, papers whose primary focus is on particular multimedia products or the current state of the industry will not be included.