{"title":"WaveNet Factorization with Singular Value Decomposition for Voice Conversion","authors":"Hongqiang Du, Xiaohai Tian, Lei Xie, Haizhou Li","doi":"10.1109/ASRU46091.2019.9003801","DOIUrl":null,"url":null,"abstract":"WaveNet vocoder has seen its great advantage over traditional vocoders in voice quality. However, it usually requires a relatively large amount of speech data to train a speaker-dependent WaveNet vocoder. Therefore, it remains a challenge to build a high-quality WaveNet vocoder for low resource tasks, e.g. voice conversion, where speech samples are limited in real applications. We propose to use singular value decomposition (SVD) to reduce WaveNet parameters while maintaining its output voice quality. Specifically, we apply SVD on dilated convolution layers, and impose semi-orthogonal constraint to improve the performance. Experiments conducted on CMU-ARCTIC database show that as compared with the original WaveNet vocoder, the proposed method maintains similar performance, in terms of both quality and similarity, while using much less training data.","PeriodicalId":150913,"journal":{"name":"2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)","volume":"46 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ASRU46091.2019.9003801","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4
Abstract
WaveNet vocoder has seen its great advantage over traditional vocoders in voice quality. However, it usually requires a relatively large amount of speech data to train a speaker-dependent WaveNet vocoder. Therefore, it remains a challenge to build a high-quality WaveNet vocoder for low resource tasks, e.g. voice conversion, where speech samples are limited in real applications. We propose to use singular value decomposition (SVD) to reduce WaveNet parameters while maintaining its output voice quality. Specifically, we apply SVD on dilated convolution layers, and impose semi-orthogonal constraint to improve the performance. Experiments conducted on CMU-ARCTIC database show that as compared with the original WaveNet vocoder, the proposed method maintains similar performance, in terms of both quality and similarity, while using much less training data.