通过减少过度平滑问题改进HMM语音合成

2008 6th International Symposium on Chinese Spoken Language Processing Pub Date : 2008-12-01 DOI:10.1109/CHINSL.2008.ECP.16

Meng Zhang, J. Tao, Huibin Jia, Xia Wang

{"title":"通过减少过度平滑问题改进HMM语音合成","authors":"Meng Zhang, J. Tao, Huibin Jia, Xia Wang","doi":"10.1109/CHINSL.2008.ECP.16","DOIUrl":null,"url":null,"abstract":"Although hidden Markov model based speech synthesis has been proved to have good performance, there are still some factors which degrade the quality of synthesized speech: vocoder, model accuracy and over-smoothing. This paper analyzes these factors separately. Modifications for removing different factors are proposed. Experimental results show that over-smoothing in frequency domain mainly affect the quality of synthesized speech whereas over-smoothing in time domain can nearly be ignored. Time domain over-smoothing is generally caused by model structure accuracy problem and frequency domain over- smoothing is caused by training algorithm accuracy problem. Currently used model structure is capable of representing speech without quality degradation. ML-estimation based parameter training algorithm causes distortion of perception in speech synthesis. Modification for improving parameter training algorithm is more likely to improve the synthesizing performance.","PeriodicalId":291958,"journal":{"name":"2008 6th International Symposium on Chinese Spoken Language Processing","volume":"17 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2008-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":"{\"title\":\"Improving HMM Based Speech Synthesis by Reducing Over-Smoothing Problems\",\"authors\":\"Meng Zhang, J. Tao, Huibin Jia, Xia Wang\",\"doi\":\"10.1109/CHINSL.2008.ECP.16\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Although hidden Markov model based speech synthesis has been proved to have good performance, there are still some factors which degrade the quality of synthesized speech: vocoder, model accuracy and over-smoothing. This paper analyzes these factors separately. Modifications for removing different factors are proposed. Experimental results show that over-smoothing in frequency domain mainly affect the quality of synthesized speech whereas over-smoothing in time domain can nearly be ignored. Time domain over-smoothing is generally caused by model structure accuracy problem and frequency domain over- smoothing is caused by training algorithm accuracy problem. Currently used model structure is capable of representing speech without quality degradation. ML-estimation based parameter training algorithm causes distortion of perception in speech synthesis. Modification for improving parameter training algorithm is more likely to improve the synthesizing performance.\",\"PeriodicalId\":291958,\"journal\":{\"name\":\"2008 6th International Symposium on Chinese Spoken Language Processing\",\"volume\":\"17 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2008-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"12\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2008 6th International Symposium on Chinese Spoken Language Processing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CHINSL.2008.ECP.16\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2008 6th International Symposium on Chinese Spoken Language Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CHINSL.2008.ECP.16","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 12

摘要

尽管基于隐马尔可夫模型的语音合成已被证明具有良好的性能，但仍然存在一些降低合成语音质量的因素:声码器、模型精度和过度平滑。本文分别对这些因素进行了分析。提出了消除不同因素的修正方法。实验结果表明，频域的过度平滑主要影响合成语音的质量，而时域的过度平滑几乎可以忽略。时域过平滑一般是由模型结构精度问题引起的，而频域过平滑一般是由训练算法精度问题引起的。目前使用的模型结构能够在不降低质量的情况下表示语音。基于ml估计的参数训练算法在语音合成中会引起感知失真。对参数训练算法进行改进，更有可能提高综合性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Improving HMM Based Speech Synthesis by Reducing Over-Smoothing Problems

Although hidden Markov model based speech synthesis has been proved to have good performance, there are still some factors which degrade the quality of synthesized speech: vocoder, model accuracy and over-smoothing. This paper analyzes these factors separately. Modifications for removing different factors are proposed. Experimental results show that over-smoothing in frequency domain mainly affect the quality of synthesized speech whereas over-smoothing in time domain can nearly be ignored. Time domain over-smoothing is generally caused by model structure accuracy problem and frequency domain over- smoothing is caused by training algorithm accuracy problem. Currently used model structure is capable of representing speech without quality degradation. ML-estimation based parameter training algorithm causes distortion of perception in speech synthesis. Modification for improving parameter training algorithm is more likely to improve the synthesizing performance.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2008 6th International Symposium on Chinese Spoken Language Processing

自引率

0.00%

发文量