{"title":"Voice conversion based on continuous frequency warping and magnitude scaling","authors":"Yuhang Ye, B. Lawlor","doi":"10.1109/ISSC.2017.7983598","DOIUrl":null,"url":null,"abstract":"In this paper, we present a novel spectrum mapping method — Continuous Frequency Warping and Magnitude Scaling (CFWMS) for voice conversion under the Joint Density Gaussian Mixture Model (JDGMM) framework. JDGMM is a mature clustering technique that models the joint probability density of speech signals from paired speakers. The conventional JDGMM-based approaches morph the spectral features via least square optimization. However, the speech quality is degenerated as the converted features are blurred by statistical smoothing and the uncorrelated conversion functions between adjacent frames cause noticeable distortion. To this end, CFWMS proposes a twofold frame-level conversion method — Frequency Warping and Magnitude Scaling (FWMS). FWMS directly operates on signals in the frequency domain without statistical smoothing. Moreover, a trajectory limitation strategy is introduced to renovate the discontinuities between adjacent frames. Note that the proposed solution does not require global information of sentences, making it feasible for low latency (e.g. real-time) applications. The experimental results show significantly improvements in terms of the speech quality and the perceptual identity.","PeriodicalId":170320,"journal":{"name":"2017 28th Irish Signals and Systems Conference (ISSC)","volume":"2500 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 28th Irish Signals and Systems Conference (ISSC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISSC.2017.7983598","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
In this paper, we present a novel spectrum mapping method — Continuous Frequency Warping and Magnitude Scaling (CFWMS) for voice conversion under the Joint Density Gaussian Mixture Model (JDGMM) framework. JDGMM is a mature clustering technique that models the joint probability density of speech signals from paired speakers. The conventional JDGMM-based approaches morph the spectral features via least square optimization. However, the speech quality is degenerated as the converted features are blurred by statistical smoothing and the uncorrelated conversion functions between adjacent frames cause noticeable distortion. To this end, CFWMS proposes a twofold frame-level conversion method — Frequency Warping and Magnitude Scaling (FWMS). FWMS directly operates on signals in the frequency domain without statistical smoothing. Moreover, a trajectory limitation strategy is introduced to renovate the discontinuities between adjacent frames. Note that the proposed solution does not require global information of sentences, making it feasible for low latency (e.g. real-time) applications. The experimental results show significantly improvements in terms of the speech quality and the perceptual identity.