Visual speech synthesis from 3D mesh sequences driven by combined speech features

2017 IEEE International Conference on Multimedia and Expo (ICME) Pub Date : 2017-07-01 DOI:10.1109/ICME.2017.8019546

Felix Kuhnke, J. Ostermann

引用次数: 6

Abstract

Given a pre-registered 3D mesh sequence and accompanying phoneme-labeled audio, our system creates an animatable face model and a mapping procedure to produce realistic speech animations for arbitrary speech input. Mapping of speech features to model parameters is done using random forests for regression. We propose a new speech feature based on phonemic labels and acoustic features. The novel feature produces more expressive facial animation and it robustly handles temporal labeling errors. Furthermore, by employing a sliding window approach to feature extraction, the system is easy to train and allows for low-delay synthesis. We show that our novel combination of speech features improves visual speech synthesis. Our findings are confirmed by a subjective user study.

查看原文本刊更多论文

由组合语音特征驱动的三维网格序列视觉语音合成

给定预注册的3D网格序列和伴随的音素标记音频，我们的系统创建了一个可动画的面部模型和一个映射程序，为任意语音输入生成逼真的语音动画。语音特征到模型参数的映射使用随机森林进行回归。我们提出了一种基于音位标签和声学特征的语音特征。该新特征产生了更具表现力的面部动画，并能鲁棒地处理时间标记错误。此外，通过采用滑动窗口方法进行特征提取，系统易于训练并允许低延迟合成。我们表明，我们的新语音特征组合提高了视觉语音合成。我们的发现得到了主观用户研究的证实。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2017 IEEE International Conference on Multimedia and Expo (ICME)

自引率

0.00%

发文量