基于周期/非周期分解的波网声码器语音合成

2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) Pub Date : 2018-11-01 DOI:10.23919/APSIPA.2018.8659541

Takato Fujimoto, Takenori Yoshimura, Kei Hashimoto, Keiichiro Oura, Yoshihiko Nankaku, K. Tokuda

{"title":"基于周期/非周期分解的波网声码器语音合成","authors":"Takato Fujimoto, Takenori Yoshimura, Kei Hashimoto, Keiichiro Oura, Yoshihiko Nankaku, K. Tokuda","doi":"10.23919/APSIPA.2018.8659541","DOIUrl":null,"url":null,"abstract":"This paper proposes speech synthesis using a WaveNet vocoder based on periodic/aperiodic decomposition. Normally, quasiperiodic and aperiodic components are contained in human speech waveforms. Therefore, it is important to accurately model periodic and aperiodic components. Periodic and aperiodic components are represented as the ratios of the energies in conventional statistical parametric speech synthesis. On the other hand, statistical parametric speech synthesis based on periodic/aperiodic decomposition has been proposed. Although the effectiveness of this approach has been shown, speech waveforms considering both periodic and aperiodic components cannot be generated directly. In this paper, we propose speech synthesis using a WaveNet vocoder based on periodic/aperiodic decomposition. In the proposed approach, separated periodic and aperiodic components are modeled by a single acoustic model based on deep neural networks, and then speech waveforms considering both periodic and aperiodic components are directly generated by a single WaveNet vocoder based on neural networks. Experimental results show that the proposed approach outperforms the conventional approach in the naturalness of the synthesized speech.","PeriodicalId":287799,"journal":{"name":"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Speech Synthesis Using WaveNet Vocoder Based on Periodic/Aperiodic Decomposition\",\"authors\":\"Takato Fujimoto, Takenori Yoshimura, Kei Hashimoto, Keiichiro Oura, Yoshihiko Nankaku, K. Tokuda\",\"doi\":\"10.23919/APSIPA.2018.8659541\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper proposes speech synthesis using a WaveNet vocoder based on periodic/aperiodic decomposition. Normally, quasiperiodic and aperiodic components are contained in human speech waveforms. Therefore, it is important to accurately model periodic and aperiodic components. Periodic and aperiodic components are represented as the ratios of the energies in conventional statistical parametric speech synthesis. On the other hand, statistical parametric speech synthesis based on periodic/aperiodic decomposition has been proposed. Although the effectiveness of this approach has been shown, speech waveforms considering both periodic and aperiodic components cannot be generated directly. In this paper, we propose speech synthesis using a WaveNet vocoder based on periodic/aperiodic decomposition. In the proposed approach, separated periodic and aperiodic components are modeled by a single acoustic model based on deep neural networks, and then speech waveforms considering both periodic and aperiodic components are directly generated by a single WaveNet vocoder based on neural networks. Experimental results show that the proposed approach outperforms the conventional approach in the naturalness of the synthesized speech.\",\"PeriodicalId\":287799,\"journal\":{\"name\":\"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)\",\"volume\":\"24 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.23919/APSIPA.2018.8659541\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.23919/APSIPA.2018.8659541","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

摘要

本文提出了一种基于周期/非周期分解的WaveNet声码器语音合成方法。人的语音波形通常包含准周期和非周期分量。因此，对周期和非周期元件进行精确建模是非常重要的。在传统的统计参数语音合成中，周期分量和非周期分量表示为能量的比值。另一方面，提出了基于周期/非周期分解的统计参数语音合成方法。虽然这种方法的有效性已被证明，但考虑周期和非周期分量的语音波形不能直接生成。在本文中，我们提出了一种基于周期/非周期分解的WaveNet声码器的语音合成。该方法利用基于深度神经网络的单一声学模型对分离的周期分量和非周期分量进行建模，然后利用基于神经网络的单一WaveNet声码器直接生成同时考虑周期分量和非周期分量的语音波形。实验结果表明，该方法在合成语音的自然度方面优于传统方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Speech Synthesis Using WaveNet Vocoder Based on Periodic/Aperiodic Decomposition

This paper proposes speech synthesis using a WaveNet vocoder based on periodic/aperiodic decomposition. Normally, quasiperiodic and aperiodic components are contained in human speech waveforms. Therefore, it is important to accurately model periodic and aperiodic components. Periodic and aperiodic components are represented as the ratios of the energies in conventional statistical parametric speech synthesis. On the other hand, statistical parametric speech synthesis based on periodic/aperiodic decomposition has been proposed. Although the effectiveness of this approach has been shown, speech waveforms considering both periodic and aperiodic components cannot be generated directly. In this paper, we propose speech synthesis using a WaveNet vocoder based on periodic/aperiodic decomposition. In the proposed approach, separated periodic and aperiodic components are modeled by a single acoustic model based on deep neural networks, and then speech waveforms considering both periodic and aperiodic components are directly generated by a single WaveNet vocoder based on neural networks. Experimental results show that the proposed approach outperforms the conventional approach in the naturalness of the synthesized speech.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)

自引率

0.00%

发文量