Improving HMM Based Speech Synthesis by Reducing Over-Smoothing Problems

2008 6th International Symposium on Chinese Spoken Language Processing Pub Date : 2008-12-01 DOI:10.1109/CHINSL.2008.ECP.16

Meng Zhang, J. Tao, Huibin Jia, Xia Wang

引用次数: 12

Abstract

Although hidden Markov model based speech synthesis has been proved to have good performance, there are still some factors which degrade the quality of synthesized speech: vocoder, model accuracy and over-smoothing. This paper analyzes these factors separately. Modifications for removing different factors are proposed. Experimental results show that over-smoothing in frequency domain mainly affect the quality of synthesized speech whereas over-smoothing in time domain can nearly be ignored. Time domain over-smoothing is generally caused by model structure accuracy problem and frequency domain over- smoothing is caused by training algorithm accuracy problem. Currently used model structure is capable of representing speech without quality degradation. ML-estimation based parameter training algorithm causes distortion of perception in speech synthesis. Modification for improving parameter training algorithm is more likely to improve the synthesizing performance.

查看原文本刊更多论文

通过减少过度平滑问题改进HMM语音合成

尽管基于隐马尔可夫模型的语音合成已被证明具有良好的性能，但仍然存在一些降低合成语音质量的因素:声码器、模型精度和过度平滑。本文分别对这些因素进行了分析。提出了消除不同因素的修正方法。实验结果表明，频域的过度平滑主要影响合成语音的质量，而时域的过度平滑几乎可以忽略。时域过平滑一般是由模型结构精度问题引起的，而频域过平滑一般是由训练算法精度问题引起的。目前使用的模型结构能够在不降低质量的情况下表示语音。基于ml估计的参数训练算法在语音合成中会引起感知失真。对参数训练算法进行改进，更有可能提高综合性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2008 6th International Symposium on Chinese Spoken Language Processing

自引率

0.00%

发文量