基于深度神经网络的统计参数语音合成多类学习算法

2016 24th European Signal Processing Conference (EUSIPCO) Pub Date : 2016-08-01 DOI:10.1109/EUSIPCO.2016.7760589

Eunwoo Song, Hong-Goo Kang

{"title":"基于深度神经网络的统计参数语音合成多类学习算法","authors":"Eunwoo Song, Hong-Goo Kang","doi":"10.1109/EUSIPCO.2016.7760589","DOIUrl":null,"url":null,"abstract":"This paper proposes a multi-class learning (MCL) algorithm for a deep neural network (DNN)-based statistical parametric speech synthesis (SPSS) system. Although the DNN-based SPSS system improves the modeling accuracy of statistical parameters, its synthesized speech is often muffled because the training process only considers the global characteristics of the entire set of training data, but does not explicitly consider any local variations. We introduce a DNN-based context clustering algorithm that implicitly divides the training data into several classes, and train them via a shared hidden layer-based MCL algorithm. Since the proposed MCL method efficiently models both the universal and class-dependent characteristics of various phonetic information, it not only avoids the model over-fitting problem but also reduces the over-smoothing effect. Objective and subjective test results also verify that the proposed algorithm performs much better than the conventional method.","PeriodicalId":127068,"journal":{"name":"2016 24th European Signal Processing Conference (EUSIPCO)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Multi-class learning algorithm for deep neural network-based statistical parametric speech synthesis\",\"authors\":\"Eunwoo Song, Hong-Goo Kang\",\"doi\":\"10.1109/EUSIPCO.2016.7760589\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper proposes a multi-class learning (MCL) algorithm for a deep neural network (DNN)-based statistical parametric speech synthesis (SPSS) system. Although the DNN-based SPSS system improves the modeling accuracy of statistical parameters, its synthesized speech is often muffled because the training process only considers the global characteristics of the entire set of training data, but does not explicitly consider any local variations. We introduce a DNN-based context clustering algorithm that implicitly divides the training data into several classes, and train them via a shared hidden layer-based MCL algorithm. Since the proposed MCL method efficiently models both the universal and class-dependent characteristics of various phonetic information, it not only avoids the model over-fitting problem but also reduces the over-smoothing effect. Objective and subjective test results also verify that the proposed algorithm performs much better than the conventional method.\",\"PeriodicalId\":127068,\"journal\":{\"name\":\"2016 24th European Signal Processing Conference (EUSIPCO)\",\"volume\":\"19 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 24th European Signal Processing Conference (EUSIPCO)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/EUSIPCO.2016.7760589\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 24th European Signal Processing Conference (EUSIPCO)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/EUSIPCO.2016.7760589","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

针对基于深度神经网络(DNN)的统计参数语音合成(SPSS)系统，提出了一种多类学习(MCL)算法。虽然基于dnn的SPSS系统提高了统计参数的建模精度，但由于训练过程只考虑整个训练数据集的全局特征，而没有明确考虑任何局部变化，因此其合成语音往往会受到抑制。我们引入了一种基于dnn的上下文聚类算法，该算法隐式地将训练数据划分为几个类，并通过基于共享隐藏层的MCL算法进行训练。由于所提出的MCL方法有效地建立了各种语音信息的通用特征和类相关特征的模型，既避免了模型的过拟合问题，又减少了过度平滑效应。客观和主观测试结果也验证了该算法的性能明显优于传统方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Multi-class learning algorithm for deep neural network-based statistical parametric speech synthesis

This paper proposes a multi-class learning (MCL) algorithm for a deep neural network (DNN)-based statistical parametric speech synthesis (SPSS) system. Although the DNN-based SPSS system improves the modeling accuracy of statistical parameters, its synthesized speech is often muffled because the training process only considers the global characteristics of the entire set of training data, but does not explicitly consider any local variations. We introduce a DNN-based context clustering algorithm that implicitly divides the training data into several classes, and train them via a shared hidden layer-based MCL algorithm. Since the proposed MCL method efficiently models both the universal and class-dependent characteristics of various phonetic information, it not only avoids the model over-fitting problem but also reduces the over-smoothing effect. Objective and subjective test results also verify that the proposed algorithm performs much better than the conventional method.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2016 24th European Signal Processing Conference (EUSIPCO)

自引率

0.00%

发文量