Young-Sun Yun, Jinmang Jung, Seongbae Eun, Shin Cha, S. So
{"title":"Voice Conversion of Synthesized Speeches Using Deep Neural Networks","authors":"Young-Sun Yun, Jinmang Jung, Seongbae Eun, Shin Cha, S. So","doi":"10.1109/ICGHIT.2019.00029","DOIUrl":null,"url":null,"abstract":"Voice conversion is the transform technique of the individuality between source and target speakers. In the previous studies, we proposed the voice conversion using synthesized speeches based on formant or line spectral information. The suggested method used the piecewise linear transform function of formant or LSP(line spectral pairs) features on formant intervals. In this paper, we propose the conversion of the individuality between speakers using a deep neural network. Along with improvements in deep neural network research, end-to-end speech conversion methods have been proposed and the results are suitable for generating more natural utterances compared to the past. Among them, we explore the representative speech generation methods and propose the voice conversion system to use formant features as additional information for local and global conditioned deep neural networks.","PeriodicalId":160708,"journal":{"name":"2019 International Conference on Green and Human Information Technology (ICGHIT)","volume":"99 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 International Conference on Green and Human Information Technology (ICGHIT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICGHIT.2019.00029","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
Voice conversion is the transform technique of the individuality between source and target speakers. In the previous studies, we proposed the voice conversion using synthesized speeches based on formant or line spectral information. The suggested method used the piecewise linear transform function of formant or LSP(line spectral pairs) features on formant intervals. In this paper, we propose the conversion of the individuality between speakers using a deep neural network. Along with improvements in deep neural network research, end-to-end speech conversion methods have been proposed and the results are suitable for generating more natural utterances compared to the past. Among them, we explore the representative speech generation methods and propose the voice conversion system to use formant features as additional information for local and global conditioned deep neural networks.