{"title":"拉普拉斯分布输出的浅波声码器研究","authors":"Patrick Lumban Tobing, Tomoki Hayashi, T. Toda","doi":"10.1109/ASRU46091.2019.9003800","DOIUrl":null,"url":null,"abstract":"In this paper, an investigation of shallow architecture and Laplacian distribution output for WaveNet vocoder trained with limited training data is presented. The use of shallower WaveNet architecture is proposed to accommodate the possibility of more suitable use case with limited data and to reduce the computation time. In order to further improve the modeling of WaveNet vocoder, the use of Laplacian distribution output is proposed. Laplacian distribution is inherently a sparse distribution, with higher peak and fatter tail than the Gaussian, which might be more suitable for speech signal modeling. The experimental results demonstrate that: 1) the proposed shallow variant of WaveNet architecture gives comparable performance compared to the deep one with softmax output, while reducing the computation time by 73%; and 2) the use of Laplacian distribution output consistently improves the speech quality in various amounts of limited training data, reaching a value of 4.22 for the two highest mean opinion scores.","PeriodicalId":150913,"journal":{"name":"2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)","volume":"240 ","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Investigation of Shallow Wavenet Vocoder with Laplacian Distribution Output\",\"authors\":\"Patrick Lumban Tobing, Tomoki Hayashi, T. Toda\",\"doi\":\"10.1109/ASRU46091.2019.9003800\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, an investigation of shallow architecture and Laplacian distribution output for WaveNet vocoder trained with limited training data is presented. The use of shallower WaveNet architecture is proposed to accommodate the possibility of more suitable use case with limited data and to reduce the computation time. In order to further improve the modeling of WaveNet vocoder, the use of Laplacian distribution output is proposed. Laplacian distribution is inherently a sparse distribution, with higher peak and fatter tail than the Gaussian, which might be more suitable for speech signal modeling. The experimental results demonstrate that: 1) the proposed shallow variant of WaveNet architecture gives comparable performance compared to the deep one with softmax output, while reducing the computation time by 73%; and 2) the use of Laplacian distribution output consistently improves the speech quality in various amounts of limited training data, reaching a value of 4.22 for the two highest mean opinion scores.\",\"PeriodicalId\":150913,\"journal\":{\"name\":\"2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)\",\"volume\":\"240 \",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ASRU46091.2019.9003800\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ASRU46091.2019.9003800","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Investigation of Shallow Wavenet Vocoder with Laplacian Distribution Output
In this paper, an investigation of shallow architecture and Laplacian distribution output for WaveNet vocoder trained with limited training data is presented. The use of shallower WaveNet architecture is proposed to accommodate the possibility of more suitable use case with limited data and to reduce the computation time. In order to further improve the modeling of WaveNet vocoder, the use of Laplacian distribution output is proposed. Laplacian distribution is inherently a sparse distribution, with higher peak and fatter tail than the Gaussian, which might be more suitable for speech signal modeling. The experimental results demonstrate that: 1) the proposed shallow variant of WaveNet architecture gives comparable performance compared to the deep one with softmax output, while reducing the computation time by 73%; and 2) the use of Laplacian distribution output consistently improves the speech quality in various amounts of limited training data, reaching a value of 4.22 for the two highest mean opinion scores.