语音驱动面部动画的语义谈话风格空间。

IF 6.5

IEEE transactions on visualization and computer graphics Pub Date : 2025-09-29 DOI:10.1109/TVCG.2025.3615390

Yujin Chai, Yanlin Weng, Tianjia Shao, Kun Zhou

{"title":"语音驱动面部动画的语义谈话风格空间。","authors":"Yujin Chai, Yanlin Weng, Tianjia Shao, Kun Zhou","doi":"10.1109/TVCG.2025.3615390","DOIUrl":null,"url":null,"abstract":"We present a latent talking style space with semantic meanings for speech-driven 3D facial animation. The style space is learned from 3D speech facial animations via a self-supervision paradigm without any style labeling, leading to an automatic separation of high-level attributes, i.e., different channels of the latent style code possess different semantic meanings, such as a wide/slightly open mouth, a grinning/round mouth, and frowning/raising eyebrows. The style space enables intuitive and flexible control of talking styles in speech-driven facial animation through manipulating the channels of style code. To effectively learn such a style space, we propose a two-stage approach, involving two deep neural networks, to disentangle the person identity, speech content, and talking style contained in 3D speech facial animations. The training is performed on a novel dataset of 3D talking faces of various styles, constructed from over ten hours of videos of 200 subjects collected from the Internet.","PeriodicalId":94035,"journal":{"name":"IEEE transactions on visualization and computer graphics","volume":"PP ","pages":""},"PeriodicalIF":6.5000,"publicationDate":"2025-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Semantic Talking Style Space for Speech-driven Facial Animation.\",\"authors\":\"Yujin Chai, Yanlin Weng, Tianjia Shao, Kun Zhou\",\"doi\":\"10.1109/TVCG.2025.3615390\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We present a latent talking style space with semantic meanings for speech-driven 3D facial animation. The style space is learned from 3D speech facial animations via a self-supervision paradigm without any style labeling, leading to an automatic separation of high-level attributes, i.e., different channels of the latent style code possess different semantic meanings, such as a wide/slightly open mouth, a grinning/round mouth, and frowning/raising eyebrows. The style space enables intuitive and flexible control of talking styles in speech-driven facial animation through manipulating the channels of style code. To effectively learn such a style space, we propose a two-stage approach, involving two deep neural networks, to disentangle the person identity, speech content, and talking style contained in 3D speech facial animations. The training is performed on a novel dataset of 3D talking faces of various styles, constructed from over ten hours of videos of 200 subjects collected from the Internet.\",\"PeriodicalId\":94035,\"journal\":{\"name\":\"IEEE transactions on visualization and computer graphics\",\"volume\":\"PP \",\"pages\":\"\"},\"PeriodicalIF\":6.5000,\"publicationDate\":\"2025-09-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE transactions on visualization and computer graphics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/TVCG.2025.3615390\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on visualization and computer graphics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/TVCG.2025.3615390","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

提出了一种具有语义的潜在说话风格空间，用于语音驱动的三维面部动画。风格空间从三维语音面部动画中学习，采用自监督范式，不做任何风格标注，自动分离高级属性，即不同通道的潜在风格代码具有不同的语义含义，如嘴巴大/微张，咧嘴笑/圆嘴，皱眉/扬眉。风格空间通过操纵风格代码的通道，可以直观灵活地控制语音驱动面部动画中的说话风格。为了有效地学习这种风格空间，我们提出了一种涉及两个深度神经网络的两阶段方法，以解开3D语音面部动画中包含的人的身份、语音内容和说话风格。训练是在一个新颖的各种风格的3D说话脸数据集上进行的，该数据集是由从互联网上收集的200个主题的10多个小时的视频构建而成的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A Semantic Talking Style Space for Speech-driven Facial Animation.

We present a latent talking style space with semantic meanings for speech-driven 3D facial animation. The style space is learned from 3D speech facial animations via a self-supervision paradigm without any style labeling, leading to an automatic separation of high-level attributes, i.e., different channels of the latent style code possess different semantic meanings, such as a wide/slightly open mouth, a grinning/round mouth, and frowning/raising eyebrows. The style space enables intuitive and flexible control of talking styles in speech-driven facial animation through manipulating the channels of style code. To effectively learn such a style space, we propose a two-stage approach, involving two deep neural networks, to disentangle the person identity, speech content, and talking style contained in 3D speech facial animations. The training is performed on a novel dataset of 3D talking faces of various styles, constructed from over ten hours of videos of 200 subjects collected from the Internet.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE transactions on visualization and computer graphics

自引率

0.00%

发文量