{"title":"Improving the performance of MGM-based voice conversion by preparing training data method","authors":"Guoyu Zuo, Wenju Liu, Xiaogang Ruan","doi":"10.1109/CHINSL.2004.1409616","DOIUrl":null,"url":null,"abstract":"This paper proposes an approach to improve both the target speaker's individuality and the quality of the converted speech by preparing the training data. In mixture Gaussian spectral mapping (MGM) based voice conversion, spectral feature representations are analyzed to obtain the right feature associations between the source and target characteristics. A voiced and unvoiced (V/U-V) decision scheme for time-alignment is provided to obtain the right data for training the MGM function while removing the misaligned data. Experiments are conducted in terms of the applications of spectral representation methods, and V/UV decision strategies, to the MGM functions. When linear predictive cepstral coefficients (LPCC) are used for time-alignment and the V/UV decisions are adopted for removing bad data, results show that the conversion function can get a better accuracy and the proposed method can effectively improve the overall performance of voice conversion.","PeriodicalId":212562,"journal":{"name":"2004 International Symposium on Chinese Spoken Language Processing","volume":"31 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2004-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2004 International Symposium on Chinese Spoken Language Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CHINSL.2004.1409616","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
This paper proposes an approach to improve both the target speaker's individuality and the quality of the converted speech by preparing the training data. In mixture Gaussian spectral mapping (MGM) based voice conversion, spectral feature representations are analyzed to obtain the right feature associations between the source and target characteristics. A voiced and unvoiced (V/U-V) decision scheme for time-alignment is provided to obtain the right data for training the MGM function while removing the misaligned data. Experiments are conducted in terms of the applications of spectral representation methods, and V/UV decision strategies, to the MGM functions. When linear predictive cepstral coefficients (LPCC) are used for time-alignment and the V/UV decisions are adopted for removing bad data, results show that the conversion function can get a better accuracy and the proposed method can effectively improve the overall performance of voice conversion.