{"title":"噪声环境下基于样本的语音转换","authors":"R. Takashima, T. Takiguchi, Y. Ariki","doi":"10.1109/SLT.2012.6424242","DOIUrl":null,"url":null,"abstract":"This paper presents a voice conversion (VC) technique for noisy environments, where parallel exemplars are introduced to encode the source speech signal and synthesize the target speech signal. The parallel exemplars (dictionary) consist of the source exemplars and target exemplars, having the same texts uttered by the source and target speakers. The input source signal is decomposed into the source exemplars, noise exemplars obtained from the input signal, and their weights (activities). Then, by using the weights of the source exemplars, the converted signal is constructed from the target exemplars. We carried out speaker conversion tasks using clean speech data and noise-added speech data. The effectiveness of this method was confirmed by comparing its effectiveness with that of a conventional Gaussian Mixture Model (GMM)-based method.","PeriodicalId":375378,"journal":{"name":"2012 IEEE Spoken Language Technology Workshop (SLT)","volume":"48 2","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"139","resultStr":"{\"title\":\"Exemplar-based voice conversion in noisy environment\",\"authors\":\"R. Takashima, T. Takiguchi, Y. Ariki\",\"doi\":\"10.1109/SLT.2012.6424242\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper presents a voice conversion (VC) technique for noisy environments, where parallel exemplars are introduced to encode the source speech signal and synthesize the target speech signal. The parallel exemplars (dictionary) consist of the source exemplars and target exemplars, having the same texts uttered by the source and target speakers. The input source signal is decomposed into the source exemplars, noise exemplars obtained from the input signal, and their weights (activities). Then, by using the weights of the source exemplars, the converted signal is constructed from the target exemplars. We carried out speaker conversion tasks using clean speech data and noise-added speech data. The effectiveness of this method was confirmed by comparing its effectiveness with that of a conventional Gaussian Mixture Model (GMM)-based method.\",\"PeriodicalId\":375378,\"journal\":{\"name\":\"2012 IEEE Spoken Language Technology Workshop (SLT)\",\"volume\":\"48 2\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2012-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"139\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2012 IEEE Spoken Language Technology Workshop (SLT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SLT.2012.6424242\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 IEEE Spoken Language Technology Workshop (SLT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SLT.2012.6424242","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Exemplar-based voice conversion in noisy environment
This paper presents a voice conversion (VC) technique for noisy environments, where parallel exemplars are introduced to encode the source speech signal and synthesize the target speech signal. The parallel exemplars (dictionary) consist of the source exemplars and target exemplars, having the same texts uttered by the source and target speakers. The input source signal is decomposed into the source exemplars, noise exemplars obtained from the input signal, and their weights (activities). Then, by using the weights of the source exemplars, the converted signal is constructed from the target exemplars. We carried out speaker conversion tasks using clean speech data and noise-added speech data. The effectiveness of this method was confirmed by comparing its effectiveness with that of a conventional Gaussian Mixture Model (GMM)-based method.