Structural maximum a posteriori speaker adaptation of speaking rate-dependent hierarchical prosodic model for Mandarin TTS

2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Pub Date : 2016-03-20 DOI:10.1109/ICASSP.2016.7472754

I-Bin Liao, Chen-Yu Chiang, Sin-Horng Chen

引用次数: 4

Abstract

In this paper, a structural maximum a posterior speaker adaptation method to adjust the existing speaking rate (SR) dependent hierarchical prosodic model (SR-HPM) to a new speaker's data for realizing a new voice of any given SR is discussed. The adaptive SR-HPM is formulated based on MAP estimation with a reference SR-HPM serving as an informative prior. The prior information provided by the reference SR-HPM is hierarchically organized by decision trees. The results of objective and subjective evaluations showed that the proposed method not only performed slightly better than the maximum likelihood-based model in the observed SR range of the target speaker's data, but also was much better in the unseen SR range.

查看原文本刊更多论文

基于语速的汉语TTS分层韵律模型的结构最大后验自适应

本文讨论了一种结构最大后置说话人自适应方法，将现有的依赖于说话率(SR)的分层韵律模型(SR- hpm)调整为新说话人的数据，以实现任意给定SR的新语音。在MAP估计的基础上，以参考SR-HPM作为信息先验，建立了自适应SR-HPM。参考SR-HPM提供的先验信息通过决策树分层组织。客观和主观评价结果表明，该方法不仅在目标说话人数据的可见SR范围内略优于基于最大似然的模型，而且在未见SR范围内也明显优于基于最大似然的模型。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

自引率

0.00%

发文量