Exploiting Pre-trained Feature Networks for Generative Adversarial Networks in Audio-domain Loop Generation

International Society for Music Information Retrieval Conference Pub Date : 2022-09-05 DOI:10.48550/arXiv.2209.01751

Yen-Tung Yeh, Bo-Yu Chen, Yi-Hsuan Yang

{"title":"Exploiting Pre-trained Feature Networks for Generative Adversarial Networks in Audio-domain Loop Generation","authors":"Yen-Tung Yeh, Bo-Yu Chen, Yi-Hsuan Yang","doi":"10.48550/arXiv.2209.01751","DOIUrl":null,"url":null,"abstract":"While generative adversarial networks (GANs) have been widely used in research on audio generation, the training of a GAN model is known to be unstable, time consuming, and data inefficient. Among the attempts to ameliorate the training process of GANs, the idea of Projected GAN emerges as an effective solution for GAN-based image generation, establishing the state-of-the-art in different image applications. The core idea is to use a pre-trained classifier to constrain the feature space of the discriminator to stabilize and improve GAN training. This paper investigates whether Projected GAN can similarly improve audio generation, by evaluating the performance of a StyleGAN2-based audio-domain loop generation model with and without using a pre-trained feature space in the discriminator. Moreover, we compare the performance of using a general versus domain-specific classifier as the pre-trained audio classifier. With experiments on both drum loop and synth loop generation, we show that a general audio classifier works better, and that with Projected GAN our loop generation models can converge around 5 times faster without performance degradation.","PeriodicalId":309903,"journal":{"name":"International Society for Music Information Retrieval Conference","volume":"27 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Society for Music Information Retrieval Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2209.01751","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

While generative adversarial networks (GANs) have been widely used in research on audio generation, the training of a GAN model is known to be unstable, time consuming, and data inefficient. Among the attempts to ameliorate the training process of GANs, the idea of Projected GAN emerges as an effective solution for GAN-based image generation, establishing the state-of-the-art in different image applications. The core idea is to use a pre-trained classifier to constrain the feature space of the discriminator to stabilize and improve GAN training. This paper investigates whether Projected GAN can similarly improve audio generation, by evaluating the performance of a StyleGAN2-based audio-domain loop generation model with and without using a pre-trained feature space in the discriminator. Moreover, we compare the performance of using a general versus domain-specific classifier as the pre-trained audio classifier. With experiments on both drum loop and synth loop generation, we show that a general audio classifier works better, and that with Projected GAN our loop generation models can converge around 5 times faster without performance degradation.

查看原文本刊更多论文

在音频域环路生成中利用预训练特征网络生成对抗网络

虽然生成对抗网络(GAN)已广泛应用于音频生成的研究，但已知GAN模型的训练不稳定、耗时和数据效率低下。在改进GAN训练过程的尝试中，投影GAN的思想作为基于GAN的图像生成的有效解决方案出现，在不同的图像应用中建立了最新的技术。其核心思想是使用预训练的分类器来约束鉴别器的特征空间，以稳定和改进GAN训练。本文通过评估基于stylegan2的音频域环路生成模型的性能，在鉴别器中使用预训练的特征空间和不使用预训练的特征空间，研究了投影GAN是否可以类似地改善音频生成。此外，我们比较了使用通用分类器和特定领域分类器作为预训练音频分类器的性能。通过鼓环和synth环路生成的实验，我们证明了一般音频分类器的工作效果更好，并且使用投影GAN，我们的环路生成模型可以在不降低性能的情况下收敛大约5倍。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

International Society for Music Information Retrieval Conference

自引率

0.00%

发文量