Boosting Star-GANs for Voice Conversion with Contrastive Discriminator

International Conference on Neural Information Processing Pub Date : 2022-09-21 DOI:10.48550/arXiv.2209.10088

Shijing Si, Jianzong Wang, Xulong Zhang, Xiaoyang Qu, Ning Cheng, Jing Xiao

引用次数: 1

Abstract

Nonparallel multi-domain voice conversion methods such as the StarGAN-VCs have been widely applied in many scenarios. However, the training of these models usually poses a challenge due to their complicated adversarial network architectures. To address this, in this work we leverage the state-of-the-art contrastive learning techniques and incorporate an efficient Siamese network structure into the StarGAN discriminator. Our method is called SimSiam-StarGAN-VC and it boosts the training stability and effectively prevents the discriminator overfitting issue in the training process. We conduct experiments on the Voice Conversion Challenge (VCC 2018) dataset, plus a user study to validate the performance of our framework. Our experimental results show that SimSiam-StarGAN-VC significantly outperforms existing StarGAN-VC methods in terms of both the objective and subjective metrics.

查看原文本刊更多论文

基于对比鉴别器的增强star - gan语音转换

诸如StarGAN-VCs等非并行多域语音转换方法在许多场景中得到了广泛的应用。然而，由于其复杂的对抗网络结构，这些模型的训练通常会带来挑战。为了解决这个问题，在这项工作中，我们利用了最先进的对比学习技术，并将有效的暹罗网络结构整合到StarGAN鉴别器中。我们的方法被称为SimSiam-StarGAN-VC，它提高了训练的稳定性，有效地防止了训练过程中的判别器过拟合问题。我们在语音转换挑战(VCC 2018)数据集上进行了实验，并进行了用户研究以验证我们框架的性能。实验结果表明，SimSiam-StarGAN-VC在客观和主观指标上都明显优于现有的StarGAN-VC方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

International Conference on Neural Information Processing

自引率

0.00%

发文量