Young-Sun Yun, Jinmang Jung, Seongbae Eun, Shin Cha, S. So
{"title":"基于深度神经网络的合成语音转换","authors":"Young-Sun Yun, Jinmang Jung, Seongbae Eun, Shin Cha, S. So","doi":"10.1109/ICGHIT.2019.00029","DOIUrl":null,"url":null,"abstract":"Voice conversion is the transform technique of the individuality between source and target speakers. In the previous studies, we proposed the voice conversion using synthesized speeches based on formant or line spectral information. The suggested method used the piecewise linear transform function of formant or LSP(line spectral pairs) features on formant intervals. In this paper, we propose the conversion of the individuality between speakers using a deep neural network. Along with improvements in deep neural network research, end-to-end speech conversion methods have been proposed and the results are suitable for generating more natural utterances compared to the past. Among them, we explore the representative speech generation methods and propose the voice conversion system to use formant features as additional information for local and global conditioned deep neural networks.","PeriodicalId":160708,"journal":{"name":"2019 International Conference on Green and Human Information Technology (ICGHIT)","volume":"99 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Voice Conversion of Synthesized Speeches Using Deep Neural Networks\",\"authors\":\"Young-Sun Yun, Jinmang Jung, Seongbae Eun, Shin Cha, S. So\",\"doi\":\"10.1109/ICGHIT.2019.00029\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Voice conversion is the transform technique of the individuality between source and target speakers. In the previous studies, we proposed the voice conversion using synthesized speeches based on formant or line spectral information. The suggested method used the piecewise linear transform function of formant or LSP(line spectral pairs) features on formant intervals. In this paper, we propose the conversion of the individuality between speakers using a deep neural network. Along with improvements in deep neural network research, end-to-end speech conversion methods have been proposed and the results are suitable for generating more natural utterances compared to the past. Among them, we explore the representative speech generation methods and propose the voice conversion system to use formant features as additional information for local and global conditioned deep neural networks.\",\"PeriodicalId\":160708,\"journal\":{\"name\":\"2019 International Conference on Green and Human Information Technology (ICGHIT)\",\"volume\":\"99 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 International Conference on Green and Human Information Technology (ICGHIT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICGHIT.2019.00029\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 International Conference on Green and Human Information Technology (ICGHIT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICGHIT.2019.00029","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Voice Conversion of Synthesized Speeches Using Deep Neural Networks
Voice conversion is the transform technique of the individuality between source and target speakers. In the previous studies, we proposed the voice conversion using synthesized speeches based on formant or line spectral information. The suggested method used the piecewise linear transform function of formant or LSP(line spectral pairs) features on formant intervals. In this paper, we propose the conversion of the individuality between speakers using a deep neural network. Along with improvements in deep neural network research, end-to-end speech conversion methods have been proposed and the results are suitable for generating more natural utterances compared to the past. Among them, we explore the representative speech generation methods and propose the voice conversion system to use formant features as additional information for local and global conditioned deep neural networks.