{"title":"Pitch modification based on syllable units for voice morphing system","authors":"Yinqiu Gao, Zhen Yang","doi":"10.1109/NPC.2007.11","DOIUrl":null,"url":null,"abstract":"An innovative scheme of voice morphing is proposed to make the speech of a source speaker sound like uttered by a target speaker. The morphing technique can hide people 's identity, age, gender while chatting and doing other things related to the transformation of speech streams online, which can ensure the privacy on the prevalent internet. Speaker individuality transformation is achieved by altering the spectral envelope and estimating the excitation signal by modifying the fundamental pitch frequency in syllable units of the residual signal of the source speech based on linear prediction coding (LPC) model. The main advantage of this scheme relies in the aspect of having considered the dynamic characteristic of the pitch frequency, not just focusing on the average level, which enhances the performance of the whole conversion system compared with general concepts such as discrete pitch frequency mapping and so on. Moreover, in the aspect of the alignment of line spectral frequencies (LSFs) vectors, an advanced technique based on isolated syllables rather than the general dynamic time warping algorithm (DTW) is introduced. The experimental results show that the system is capable of effectively transforming speaker identity whilst the converted speech maintains high quality.","PeriodicalId":278518,"journal":{"name":"2007 IFIP International Conference on Network and Parallel Computing Workshops (NPC 2007)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2007-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2007 IFIP International Conference on Network and Parallel Computing Workshops (NPC 2007)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/NPC.2007.11","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5
Abstract
An innovative scheme of voice morphing is proposed to make the speech of a source speaker sound like uttered by a target speaker. The morphing technique can hide people 's identity, age, gender while chatting and doing other things related to the transformation of speech streams online, which can ensure the privacy on the prevalent internet. Speaker individuality transformation is achieved by altering the spectral envelope and estimating the excitation signal by modifying the fundamental pitch frequency in syllable units of the residual signal of the source speech based on linear prediction coding (LPC) model. The main advantage of this scheme relies in the aspect of having considered the dynamic characteristic of the pitch frequency, not just focusing on the average level, which enhances the performance of the whole conversion system compared with general concepts such as discrete pitch frequency mapping and so on. Moreover, in the aspect of the alignment of line spectral frequencies (LSFs) vectors, an advanced technique based on isolated syllables rather than the general dynamic time warping algorithm (DTW) is introduced. The experimental results show that the system is capable of effectively transforming speaker identity whilst the converted speech maintains high quality.