F0 transformation techniques for statistical voice conversion with direct waveform modification with spectral differential

Kazuhiro Kobayashi, T. Toda, Satoshi Nakamura
{"title":"F0 transformation techniques for statistical voice conversion with direct waveform modification with spectral differential","authors":"Kazuhiro Kobayashi, T. Toda, Satoshi Nakamura","doi":"10.1109/SLT.2016.7846338","DOIUrl":null,"url":null,"abstract":"This paper presents several F0 transformation techniques for statistical voice conversion (VC) with direct waveform modification with spectral differential (DIFFVC). Statistical VC is a technique to convert speaker identity of a source speaker's voice into that of a target speaker by converting several acoustic features, such as spectral and excitation features. This technique usually uses vocoder to generate converted speech waveforms from the converted acoustic features. However, the use of vocoder often causes speech quality degradation of the converted voice owing to insufficient parameterization accuracy. To avoid this issue, we have proposed a direct waveform modification technique based on spectral differential filtering and have successfully applied it to intra-gender singing VC (DIFFSVC) where excitation features are not necessary converted. Moreover, we have also applied it to cross-gender singing VC by implementing F0 transformation with a constant rate such as one octave increase or decrease. On the other hand, it is not straightforward to apply the DIFFSVC framework to normal speech conversion because the F0 transformation ratio widely varies depending on a combination of the source and target speakers. In this paper, we propose several F0 transformation techniques for DIFFVC and compare their performance in terms of speech quality of the converted voice and conversion accuracy of speaker individuality. The experimental results demonstrate that the F0 transformation technique based on waveform modification achieves the best performance among the proposed techniques.","PeriodicalId":281635,"journal":{"name":"2016 IEEE Spoken Language Technology Workshop (SLT)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"16","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE Spoken Language Technology Workshop (SLT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SLT.2016.7846338","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 16

Abstract

This paper presents several F0 transformation techniques for statistical voice conversion (VC) with direct waveform modification with spectral differential (DIFFVC). Statistical VC is a technique to convert speaker identity of a source speaker's voice into that of a target speaker by converting several acoustic features, such as spectral and excitation features. This technique usually uses vocoder to generate converted speech waveforms from the converted acoustic features. However, the use of vocoder often causes speech quality degradation of the converted voice owing to insufficient parameterization accuracy. To avoid this issue, we have proposed a direct waveform modification technique based on spectral differential filtering and have successfully applied it to intra-gender singing VC (DIFFSVC) where excitation features are not necessary converted. Moreover, we have also applied it to cross-gender singing VC by implementing F0 transformation with a constant rate such as one octave increase or decrease. On the other hand, it is not straightforward to apply the DIFFSVC framework to normal speech conversion because the F0 transformation ratio widely varies depending on a combination of the source and target speakers. In this paper, we propose several F0 transformation techniques for DIFFVC and compare their performance in terms of speech quality of the converted voice and conversion accuracy of speaker individuality. The experimental results demonstrate that the F0 transformation technique based on waveform modification achieves the best performance among the proposed techniques.
F0变换技术的统计语音转换与直接波形修改与频谱微分
本文介绍了几种用于统计语音转换(VC)的直接频谱差分(DIFFVC)波形修正的F0变换技术。统计VC是一种将源说话人的说话人身份转换为目标说话人身份的技术,它通过转换几个声学特征,如频谱特征和激励特征。该技术通常使用声码器从转换后的声学特征生成转换后的语音波形。然而,声码器的使用往往会由于参数化精度不足而导致转换后的语音质量下降。为了避免这一问题,我们提出了一种基于频谱差分滤波的直接波形修改技术,并成功地将其应用于不需要转换激励特征的性别内唱歌VC (DIFFSVC)。此外,我们还将其应用于跨性别唱歌VC,通过实现一个恒定速率的F0变换,如一个八度的增加或减少。另一方面,将DIFFSVC框架应用于正常语音转换并不简单,因为F0转换比率根据源和目标说话者的组合而有很大变化。本文提出了几种用于DIFFVC的F0变换技术,并从转换后语音的语音质量和说话人个性的转换精度两方面比较了它们的性能。实验结果表明,基于波形修改的F0变换技术在所有技术中性能最好。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信