A Semantic Talking Style Space for Speech-driven Facial Animation.

IF 6.5

IEEE transactions on visualization and computer graphics Pub Date : 2025-09-29 DOI:10.1109/TVCG.2025.3615390

Yujin Chai, Yanlin Weng, Tianjia Shao, Kun Zhou

引用次数: 0

Abstract

We present a latent talking style space with semantic meanings for speech-driven 3D facial animation. The style space is learned from 3D speech facial animations via a self-supervision paradigm without any style labeling, leading to an automatic separation of high-level attributes, i.e., different channels of the latent style code possess different semantic meanings, such as a wide/slightly open mouth, a grinning/round mouth, and frowning/raising eyebrows. The style space enables intuitive and flexible control of talking styles in speech-driven facial animation through manipulating the channels of style code. To effectively learn such a style space, we propose a two-stage approach, involving two deep neural networks, to disentangle the person identity, speech content, and talking style contained in 3D speech facial animations. The training is performed on a novel dataset of 3D talking faces of various styles, constructed from over ten hours of videos of 200 subjects collected from the Internet.

查看原文本刊更多论文

语音驱动面部动画的语义谈话风格空间。

提出了一种具有语义的潜在说话风格空间，用于语音驱动的三维面部动画。风格空间从三维语音面部动画中学习，采用自监督范式，不做任何风格标注，自动分离高级属性，即不同通道的潜在风格代码具有不同的语义含义，如嘴巴大/微张，咧嘴笑/圆嘴，皱眉/扬眉。风格空间通过操纵风格代码的通道，可以直观灵活地控制语音驱动面部动画中的说话风格。为了有效地学习这种风格空间，我们提出了一种涉及两个深度神经网络的两阶段方法，以解开3D语音面部动画中包含的人的身份、语音内容和说话风格。训练是在一个新颖的各种风格的3D说话脸数据集上进行的，该数据集是由从互联网上收集的200个主题的10多个小时的视频构建而成的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE transactions on visualization and computer graphics

自引率

0.00%

发文量