基于语速的汉语TTS分层韵律模型的说话人自适应

International Symposium on Chinese Spoken Language Processing Pub Date : 2014-10-27 DOI:10.1109/ISCSLP.2014.6936616

Po-Chun Wang, I-Bin Liao, Chen-Yu Chiang, Yih-Ru Wang, Sin-Horng Chen

{"title":"基于语速的汉语TTS分层韵律模型的说话人自适应","authors":"Po-Chun Wang, I-Bin Liao, Chen-Yu Chiang, Yih-Ru Wang, Sin-Horng Chen","doi":"10.1109/ISCSLP.2014.6936616","DOIUrl":null,"url":null,"abstract":"In this paper, a speaker adaptation method to adapt an existing speaking rate-dependent hierarchical prosodic model (SR-HPM) of an SR-controlled Mandarin TTS system to new speaker's data for realizing a new voice is proposed. Two main problems are addressed: data sparseness for few adaptation utterances existing only in a small range of normal speaking rate and no adaptation data in both ranges of fast and slow speaking rates. The proposed method follows the idea of SR-HPM training to firstly normalize the prosodic-acoustic features of the new speaker's speech data, to then train an HPM by the prosody labeling and modeling algorithm, and to lastly refine the HPM to an SR-dependent model. The MAP adaptation method with model parameter extrapolation is applied to cope with the above two problems. Experimental results on a male speaker's adaptation data confirmed that the resulting adaptive SR-HPM has reasonable parameters covering a wide range of speaking rates and hence can be used in the TTS system to generate prosodic-acoustic features for synthesizing the new speaker's voice of any given SR.","PeriodicalId":271277,"journal":{"name":"International Symposium on Chinese Spoken Language Processing","volume":"6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":"{\"title\":\"Speaker adaptation of speaking rate-dependent hierarchical prosodic model for Mandarin TTS\",\"authors\":\"Po-Chun Wang, I-Bin Liao, Chen-Yu Chiang, Yih-Ru Wang, Sin-Horng Chen\",\"doi\":\"10.1109/ISCSLP.2014.6936616\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, a speaker adaptation method to adapt an existing speaking rate-dependent hierarchical prosodic model (SR-HPM) of an SR-controlled Mandarin TTS system to new speaker's data for realizing a new voice is proposed. Two main problems are addressed: data sparseness for few adaptation utterances existing only in a small range of normal speaking rate and no adaptation data in both ranges of fast and slow speaking rates. The proposed method follows the idea of SR-HPM training to firstly normalize the prosodic-acoustic features of the new speaker's speech data, to then train an HPM by the prosody labeling and modeling algorithm, and to lastly refine the HPM to an SR-dependent model. The MAP adaptation method with model parameter extrapolation is applied to cope with the above two problems. Experimental results on a male speaker's adaptation data confirmed that the resulting adaptive SR-HPM has reasonable parameters covering a wide range of speaking rates and hence can be used in the TTS system to generate prosodic-acoustic features for synthesizing the new speaker's voice of any given SR.\",\"PeriodicalId\":271277,\"journal\":{\"name\":\"International Symposium on Chinese Spoken Language Processing\",\"volume\":\"6 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-10-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"8\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Symposium on Chinese Spoken Language Processing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISCSLP.2014.6936616\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Symposium on Chinese Spoken Language Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISCSLP.2014.6936616","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 8

摘要

本文提出了一种说话人自适应方法，将现有的基于语速的分层韵律模型(SR-HPM)应用于SR-HPM控制的普通话TTS系统的新说话人数据，从而实现新的语音。本文主要解决了两个问题:仅在正常语速小范围内存在少量自适应语音的数据稀疏性问题，以及在快语速和慢语速两个范围内都没有自适应数据的问题。该方法遵循SR-HPM训练的思想，首先对新说话者语音数据的韵律声学特征进行归一化，然后通过韵律标记和建模算法对HPM进行训练，最后将HPM细化为sr依赖模型。采用模型参数外推的MAP自适应方法来解决上述两个问题。对男性说话人自适应数据的实验结果证实，所得到的自适应SR- hpm具有合理的参数，覆盖了较宽的说话速率范围，因此可以在TTS系统中用于生成韵律声学特征，以合成任何给定SR的新说话人的声音。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Speaker adaptation of speaking rate-dependent hierarchical prosodic model for Mandarin TTS

In this paper, a speaker adaptation method to adapt an existing speaking rate-dependent hierarchical prosodic model (SR-HPM) of an SR-controlled Mandarin TTS system to new speaker's data for realizing a new voice is proposed. Two main problems are addressed: data sparseness for few adaptation utterances existing only in a small range of normal speaking rate and no adaptation data in both ranges of fast and slow speaking rates. The proposed method follows the idea of SR-HPM training to firstly normalize the prosodic-acoustic features of the new speaker's speech data, to then train an HPM by the prosody labeling and modeling algorithm, and to lastly refine the HPM to an SR-dependent model. The MAP adaptation method with model parameter extrapolation is applied to cope with the above two problems. Experimental results on a male speaker's adaptation data confirmed that the resulting adaptive SR-HPM has reasonable parameters covering a wide range of speaking rates and hence can be used in the TTS system to generate prosodic-acoustic features for synthesizing the new speaker's voice of any given SR.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

International Symposium on Chinese Spoken Language Processing

自引率

0.00%

发文量