The Phoneme-Level Articulator Dynamics for Pronunciation Animation

2011 International Conference on Asian Language Processing Pub Date : 2011-11-15 DOI:10.1109/IALP.2011.13

Sheng Li, Lan Wang, En Qi

引用次数: 5

Abstract

Speech visualization can be extended to a task of pronunciation animation for language learners. In this paper, a three dimensional English articulation database is recorded using Carstens Electro-Magnetic Articulograph (EMA AG500). An HMM-based visual synthesis method for continuous speech is implemented to recover 3D articulatory information. The synthesized articulations are then compared to the EMA recordings for objective evaluation. Using a data-driven 3D talking head, the distinctions between the confusable phonemes can be depicted through both external and internal articulatory movements. The experiments have demonstrated that the HMM-based synthesis with limited training data can achieve the minimum RMS error of less than 2mm. The synthesized articulatory movements can be used for computer assisted pronunciation training.

查看原文本刊更多论文

语音动画的音素级发音器动态

语音可视化可以扩展为语言学习者的发音动画任务。本文用Carstens电磁发音仪(EMA AG500)记录了一个三维英语发音数据库。实现了一种基于hmm的连续语音视觉合成方法，以恢复三维发音信息。然后将合成的关节与EMA记录进行比较以进行客观评价。使用数据驱动的3D说话头，可以通过外部和内部发音运动来描绘容易混淆的音素之间的区别。实验表明，在训练数据有限的情况下，基于hmm的合成可以实现最小均方根误差小于2mm。合成的发音动作可用于计算机辅助发音训练。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2011 International Conference on Asian Language Processing

自引率

0.00%

发文量