{"title":"F0 transformation techniques for statistical voice conversion with direct waveform modification with spectral differential","authors":"Kazuhiro Kobayashi, T. Toda, Satoshi Nakamura","doi":"10.1109/SLT.2016.7846338","DOIUrl":null,"url":null,"abstract":"This paper presents several F0 transformation techniques for statistical voice conversion (VC) with direct waveform modification with spectral differential (DIFFVC). Statistical VC is a technique to convert speaker identity of a source speaker's voice into that of a target speaker by converting several acoustic features, such as spectral and excitation features. This technique usually uses vocoder to generate converted speech waveforms from the converted acoustic features. However, the use of vocoder often causes speech quality degradation of the converted voice owing to insufficient parameterization accuracy. To avoid this issue, we have proposed a direct waveform modification technique based on spectral differential filtering and have successfully applied it to intra-gender singing VC (DIFFSVC) where excitation features are not necessary converted. Moreover, we have also applied it to cross-gender singing VC by implementing F0 transformation with a constant rate such as one octave increase or decrease. On the other hand, it is not straightforward to apply the DIFFSVC framework to normal speech conversion because the F0 transformation ratio widely varies depending on a combination of the source and target speakers. In this paper, we propose several F0 transformation techniques for DIFFVC and compare their performance in terms of speech quality of the converted voice and conversion accuracy of speaker individuality. The experimental results demonstrate that the F0 transformation technique based on waveform modification achieves the best performance among the proposed techniques.","PeriodicalId":281635,"journal":{"name":"2016 IEEE Spoken Language Technology Workshop (SLT)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"16","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE Spoken Language Technology Workshop (SLT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SLT.2016.7846338","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 16
Abstract
This paper presents several F0 transformation techniques for statistical voice conversion (VC) with direct waveform modification with spectral differential (DIFFVC). Statistical VC is a technique to convert speaker identity of a source speaker's voice into that of a target speaker by converting several acoustic features, such as spectral and excitation features. This technique usually uses vocoder to generate converted speech waveforms from the converted acoustic features. However, the use of vocoder often causes speech quality degradation of the converted voice owing to insufficient parameterization accuracy. To avoid this issue, we have proposed a direct waveform modification technique based on spectral differential filtering and have successfully applied it to intra-gender singing VC (DIFFSVC) where excitation features are not necessary converted. Moreover, we have also applied it to cross-gender singing VC by implementing F0 transformation with a constant rate such as one octave increase or decrease. On the other hand, it is not straightforward to apply the DIFFSVC framework to normal speech conversion because the F0 transformation ratio widely varies depending on a combination of the source and target speakers. In this paper, we propose several F0 transformation techniques for DIFFVC and compare their performance in terms of speech quality of the converted voice and conversion accuracy of speaker individuality. The experimental results demonstrate that the F0 transformation technique based on waveform modification achieves the best performance among the proposed techniques.