{"title":"基于周期/非周期分解的波网声码器语音合成","authors":"Takato Fujimoto, Takenori Yoshimura, Kei Hashimoto, Keiichiro Oura, Yoshihiko Nankaku, K. Tokuda","doi":"10.23919/APSIPA.2018.8659541","DOIUrl":null,"url":null,"abstract":"This paper proposes speech synthesis using a WaveNet vocoder based on periodic/aperiodic decomposition. Normally, quasiperiodic and aperiodic components are contained in human speech waveforms. Therefore, it is important to accurately model periodic and aperiodic components. Periodic and aperiodic components are represented as the ratios of the energies in conventional statistical parametric speech synthesis. On the other hand, statistical parametric speech synthesis based on periodic/aperiodic decomposition has been proposed. Although the effectiveness of this approach has been shown, speech waveforms considering both periodic and aperiodic components cannot be generated directly. In this paper, we propose speech synthesis using a WaveNet vocoder based on periodic/aperiodic decomposition. In the proposed approach, separated periodic and aperiodic components are modeled by a single acoustic model based on deep neural networks, and then speech waveforms considering both periodic and aperiodic components are directly generated by a single WaveNet vocoder based on neural networks. Experimental results show that the proposed approach outperforms the conventional approach in the naturalness of the synthesized speech.","PeriodicalId":287799,"journal":{"name":"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Speech Synthesis Using WaveNet Vocoder Based on Periodic/Aperiodic Decomposition\",\"authors\":\"Takato Fujimoto, Takenori Yoshimura, Kei Hashimoto, Keiichiro Oura, Yoshihiko Nankaku, K. Tokuda\",\"doi\":\"10.23919/APSIPA.2018.8659541\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper proposes speech synthesis using a WaveNet vocoder based on periodic/aperiodic decomposition. Normally, quasiperiodic and aperiodic components are contained in human speech waveforms. Therefore, it is important to accurately model periodic and aperiodic components. Periodic and aperiodic components are represented as the ratios of the energies in conventional statistical parametric speech synthesis. On the other hand, statistical parametric speech synthesis based on periodic/aperiodic decomposition has been proposed. Although the effectiveness of this approach has been shown, speech waveforms considering both periodic and aperiodic components cannot be generated directly. In this paper, we propose speech synthesis using a WaveNet vocoder based on periodic/aperiodic decomposition. In the proposed approach, separated periodic and aperiodic components are modeled by a single acoustic model based on deep neural networks, and then speech waveforms considering both periodic and aperiodic components are directly generated by a single WaveNet vocoder based on neural networks. Experimental results show that the proposed approach outperforms the conventional approach in the naturalness of the synthesized speech.\",\"PeriodicalId\":287799,\"journal\":{\"name\":\"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)\",\"volume\":\"24 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.23919/APSIPA.2018.8659541\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.23919/APSIPA.2018.8659541","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Speech Synthesis Using WaveNet Vocoder Based on Periodic/Aperiodic Decomposition
This paper proposes speech synthesis using a WaveNet vocoder based on periodic/aperiodic decomposition. Normally, quasiperiodic and aperiodic components are contained in human speech waveforms. Therefore, it is important to accurately model periodic and aperiodic components. Periodic and aperiodic components are represented as the ratios of the energies in conventional statistical parametric speech synthesis. On the other hand, statistical parametric speech synthesis based on periodic/aperiodic decomposition has been proposed. Although the effectiveness of this approach has been shown, speech waveforms considering both periodic and aperiodic components cannot be generated directly. In this paper, we propose speech synthesis using a WaveNet vocoder based on periodic/aperiodic decomposition. In the proposed approach, separated periodic and aperiodic components are modeled by a single acoustic model based on deep neural networks, and then speech waveforms considering both periodic and aperiodic components are directly generated by a single WaveNet vocoder based on neural networks. Experimental results show that the proposed approach outperforms the conventional approach in the naturalness of the synthesized speech.