{"title":"基于连续频率扭曲和幅度缩放的语音转换","authors":"Yuhang Ye, B. Lawlor","doi":"10.1109/ISSC.2017.7983598","DOIUrl":null,"url":null,"abstract":"In this paper, we present a novel spectrum mapping method — Continuous Frequency Warping and Magnitude Scaling (CFWMS) for voice conversion under the Joint Density Gaussian Mixture Model (JDGMM) framework. JDGMM is a mature clustering technique that models the joint probability density of speech signals from paired speakers. The conventional JDGMM-based approaches morph the spectral features via least square optimization. However, the speech quality is degenerated as the converted features are blurred by statistical smoothing and the uncorrelated conversion functions between adjacent frames cause noticeable distortion. To this end, CFWMS proposes a twofold frame-level conversion method — Frequency Warping and Magnitude Scaling (FWMS). FWMS directly operates on signals in the frequency domain without statistical smoothing. Moreover, a trajectory limitation strategy is introduced to renovate the discontinuities between adjacent frames. Note that the proposed solution does not require global information of sentences, making it feasible for low latency (e.g. real-time) applications. The experimental results show significantly improvements in terms of the speech quality and the perceptual identity.","PeriodicalId":170320,"journal":{"name":"2017 28th Irish Signals and Systems Conference (ISSC)","volume":"2500 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Voice conversion based on continuous frequency warping and magnitude scaling\",\"authors\":\"Yuhang Ye, B. Lawlor\",\"doi\":\"10.1109/ISSC.2017.7983598\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, we present a novel spectrum mapping method — Continuous Frequency Warping and Magnitude Scaling (CFWMS) for voice conversion under the Joint Density Gaussian Mixture Model (JDGMM) framework. JDGMM is a mature clustering technique that models the joint probability density of speech signals from paired speakers. The conventional JDGMM-based approaches morph the spectral features via least square optimization. However, the speech quality is degenerated as the converted features are blurred by statistical smoothing and the uncorrelated conversion functions between adjacent frames cause noticeable distortion. To this end, CFWMS proposes a twofold frame-level conversion method — Frequency Warping and Magnitude Scaling (FWMS). FWMS directly operates on signals in the frequency domain without statistical smoothing. Moreover, a trajectory limitation strategy is introduced to renovate the discontinuities between adjacent frames. Note that the proposed solution does not require global information of sentences, making it feasible for low latency (e.g. real-time) applications. The experimental results show significantly improvements in terms of the speech quality and the perceptual identity.\",\"PeriodicalId\":170320,\"journal\":{\"name\":\"2017 28th Irish Signals and Systems Conference (ISSC)\",\"volume\":\"2500 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 28th Irish Signals and Systems Conference (ISSC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISSC.2017.7983598\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 28th Irish Signals and Systems Conference (ISSC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISSC.2017.7983598","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Voice conversion based on continuous frequency warping and magnitude scaling
In this paper, we present a novel spectrum mapping method — Continuous Frequency Warping and Magnitude Scaling (CFWMS) for voice conversion under the Joint Density Gaussian Mixture Model (JDGMM) framework. JDGMM is a mature clustering technique that models the joint probability density of speech signals from paired speakers. The conventional JDGMM-based approaches morph the spectral features via least square optimization. However, the speech quality is degenerated as the converted features are blurred by statistical smoothing and the uncorrelated conversion functions between adjacent frames cause noticeable distortion. To this end, CFWMS proposes a twofold frame-level conversion method — Frequency Warping and Magnitude Scaling (FWMS). FWMS directly operates on signals in the frequency domain without statistical smoothing. Moreover, a trajectory limitation strategy is introduced to renovate the discontinuities between adjacent frames. Note that the proposed solution does not require global information of sentences, making it feasible for low latency (e.g. real-time) applications. The experimental results show significantly improvements in terms of the speech quality and the perceptual identity.