{"title":"C-CycleTransGAN:一个具有CycleGAN和变压器的非并行可控跨性别语音转换模型","authors":"Changzeng Fu, Chaoran Liu, C. Ishi, H. Ishiguro","doi":"10.23919/APSIPAASC55919.2022.9979821","DOIUrl":null,"url":null,"abstract":"In this study, we propose a conversion intensity controllable model for the cross-gender voice conversion (VC)11Demo page can be found at https://cz26.github.io/DemoPage-c-CycleTransGAN-VoiceConversion/. In particular, we combine the CycleGAN and transformer module, and build a condition embedding network as an intensity controller. The model is firstly pre-trained with self-supervised learning on the single-gender voice reconstruction task, with the condition set to male-to-male or female-to-female. Then, we fine-tune the model on the cross-gender voice conversion task after the pretraining is completed, with the condition set to male-to-female or female-to-male. In the testing procedure, the condition is expected to be employed as a controllable parameter (scale) to adjust the conversion intensity. The proposed method was evaluated on the Voice Conversion Challenge dataset and compared to two baselines (CycleGAN, CycleTransGAN) with objective and subjective evaluations. The results show that our proposed model is able to equip the model with an additional function of cross-gender controllability and without hurting the voice conversion performance.","PeriodicalId":382967,"journal":{"name":"2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"C-CycleTransGAN: A Non-parallel Controllable Cross-gender Voice Conversion Model with CycleGAN and Transformer\",\"authors\":\"Changzeng Fu, Chaoran Liu, C. Ishi, H. Ishiguro\",\"doi\":\"10.23919/APSIPAASC55919.2022.9979821\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this study, we propose a conversion intensity controllable model for the cross-gender voice conversion (VC)11Demo page can be found at https://cz26.github.io/DemoPage-c-CycleTransGAN-VoiceConversion/. In particular, we combine the CycleGAN and transformer module, and build a condition embedding network as an intensity controller. The model is firstly pre-trained with self-supervised learning on the single-gender voice reconstruction task, with the condition set to male-to-male or female-to-female. Then, we fine-tune the model on the cross-gender voice conversion task after the pretraining is completed, with the condition set to male-to-female or female-to-male. In the testing procedure, the condition is expected to be employed as a controllable parameter (scale) to adjust the conversion intensity. The proposed method was evaluated on the Voice Conversion Challenge dataset and compared to two baselines (CycleGAN, CycleTransGAN) with objective and subjective evaluations. The results show that our proposed model is able to equip the model with an additional function of cross-gender controllability and without hurting the voice conversion performance.\",\"PeriodicalId\":382967,\"journal\":{\"name\":\"2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-11-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.23919/APSIPAASC55919.2022.9979821\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.23919/APSIPAASC55919.2022.9979821","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
C-CycleTransGAN: A Non-parallel Controllable Cross-gender Voice Conversion Model with CycleGAN and Transformer
In this study, we propose a conversion intensity controllable model for the cross-gender voice conversion (VC)11Demo page can be found at https://cz26.github.io/DemoPage-c-CycleTransGAN-VoiceConversion/. In particular, we combine the CycleGAN and transformer module, and build a condition embedding network as an intensity controller. The model is firstly pre-trained with self-supervised learning on the single-gender voice reconstruction task, with the condition set to male-to-male or female-to-female. Then, we fine-tune the model on the cross-gender voice conversion task after the pretraining is completed, with the condition set to male-to-female or female-to-male. In the testing procedure, the condition is expected to be employed as a controllable parameter (scale) to adjust the conversion intensity. The proposed method was evaluated on the Voice Conversion Challenge dataset and compared to two baselines (CycleGAN, CycleTransGAN) with objective and subjective evaluations. The results show that our proposed model is able to equip the model with an additional function of cross-gender controllability and without hurting the voice conversion performance.