Speech analysis/synthesis by Gaussian mixture approximation of the speech spectrum for voice conversion

Proceedings of the ... IEEE International Symposium on Signal Processing and Information Technology. IEEE International Symposium on Signal Processing and Information Technology Pub Date : 2013-12-01 DOI:10.1109/ISSPIT.2013.6781919

Jamal Amini, Abdoreza Sabzi Shahrebabaki, Navid Shokouhi, H. Sheikhzadeh, K. Raahemifar, M. Eslami

{"title":"Speech analysis/synthesis by Gaussian mixture approximation of the speech spectrum for voice conversion","authors":"Jamal Amini, Abdoreza Sabzi Shahrebabaki, Navid Shokouhi, H. Sheikhzadeh, K. Raahemifar, M. Eslami","doi":"10.1109/ISSPIT.2013.6781919","DOIUrl":null,"url":null,"abstract":"Voice conversion typically employs spectral features to convert a source voice to a target voice. In this paper, we propose a simple method of fitting the STRAIGHT spectrum with Gaussian mixture (GM) models for speech analysis/synthesis and spectral modification. The mean values of the Gaussians are pre-determined based on Mel-frequency spacing. The standard deviations are also adaptively adjusted using the constant-Q principle and the spectrum amplitudes. Finally, the weights of the Gaussians are determined by sampling the log-spectrum at Mel-frequencies. The proposed analysis/synthesis method (MFLS-GM) is employed for speech analysis/synthesis and voice conversion. Subjective evaluations employing MOS and ABX demonstrate superior performance of the voice conversion using the MFLS-GM compared to systems employing MFCC features. The computation cost of the proposed analysis/synthesis method is also much lower than those based on MFCC.","PeriodicalId":88960,"journal":{"name":"Proceedings of the ... IEEE International Symposium on Signal Processing and Information Technology. IEEE International Symposium on Signal Processing and Information Technology","volume":"12 1","pages":"000428-000433"},"PeriodicalIF":0.0000,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the ... IEEE International Symposium on Signal Processing and Information Technology. IEEE International Symposium on Signal Processing and Information Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISSPIT.2013.6781919","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

Abstract

Voice conversion typically employs spectral features to convert a source voice to a target voice. In this paper, we propose a simple method of fitting the STRAIGHT spectrum with Gaussian mixture (GM) models for speech analysis/synthesis and spectral modification. The mean values of the Gaussians are pre-determined based on Mel-frequency spacing. The standard deviations are also adaptively adjusted using the constant-Q principle and the spectrum amplitudes. Finally, the weights of the Gaussians are determined by sampling the log-spectrum at Mel-frequencies. The proposed analysis/synthesis method (MFLS-GM) is employed for speech analysis/synthesis and voice conversion. Subjective evaluations employing MOS and ABX demonstrate superior performance of the voice conversion using the MFLS-GM compared to systems employing MFCC features. The computation cost of the proposed analysis/synthesis method is also much lower than those based on MFCC.

查看原文本刊更多论文

语音分析/合成的高斯混合近似的语音频谱的语音转换

语音转换通常使用频谱特征将源语音转换为目标语音。本文提出了一种用高斯混合(GM)模型拟合直谱的简单方法，用于语音分析/合成和谱修改。高斯分布的均值是基于mel -频率间隔预先确定的。采用恒q原理和谱幅自适应调整标准差。最后，通过对mel频率的对数谱采样来确定高斯函数的权重。提出的分析/合成方法(MFLS-GM)用于语音分析/合成和语音转换。采用MOS和ABX的主观评价表明，与采用MFCC特征的系统相比，使用MFLS-GM的语音转换性能更好。所提出的分析/综合方法的计算成本也比基于MFCC的方法低得多。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the ... IEEE International Symposium on Signal Processing and Information Technology. IEEE International Symposium on Signal Processing and Information Technology

自引率

0.00%

发文量