C-SupConGAN: Using Contrastive Learning and Trained Data Features for Audio-to-Image Generation

Proceedings of the 2022 5th Artificial Intelligence and Cloud Computing Conference Pub Date : 2022-12-17 DOI:10.1145/3582099.3582121

Haechun Chung, Jong-Kook Kim

{"title":"C-SupConGAN: Using Contrastive Learning and Trained Data Features for Audio-to-Image Generation","authors":"Haechun Chung, Jong-Kook Kim","doi":"10.1145/3582099.3582121","DOIUrl":null,"url":null,"abstract":"In this paper, the audio-to-image generation problem is investigated, where appropriate images are generated from the audio input. A previous study, Cross-Modal Contrastive Representation Learning (CMCRL), trained using both audios and images to extract useful audio features for audio-to-image generation. The CMCRL upgraded the Generative Adversarial Networks (GAN) to achieve high performance in the generation learning phase, but the GAN showed training instability. In this paper, the C-SupConGAN that uses the conditional supervised contrastive loss (C-SupCon loss) is proposed. C-SupConGAN enhances the conditional contrastive loss (2C loss) of the Contrastive GAN (ContraGAN) that considers data-to-data relationships and data-to-class relationships in the discriminator. The audio and image embeddings extracted from the encoder pre-trained using CMCRL is used to further extend the C-SupCon loss. The extended C-SupCon loss additionally considers relations information between data embedding and the corresponding audio embedding (data-to-source relationships) or between data embedding and the corresponding image embedding (data-to-target relationships). Extensive experiments show that the proposed method improved performance, generates higher quality images for audio-to-image generation than previous research, and effectively alleviates the training collapse of GAN.","PeriodicalId":222372,"journal":{"name":"Proceedings of the 2022 5th Artificial Intelligence and Cloud Computing Conference","volume":"143 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2022 5th Artificial Intelligence and Cloud Computing Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3582099.3582121","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

In this paper, the audio-to-image generation problem is investigated, where appropriate images are generated from the audio input. A previous study, Cross-Modal Contrastive Representation Learning (CMCRL), trained using both audios and images to extract useful audio features for audio-to-image generation. The CMCRL upgraded the Generative Adversarial Networks (GAN) to achieve high performance in the generation learning phase, but the GAN showed training instability. In this paper, the C-SupConGAN that uses the conditional supervised contrastive loss (C-SupCon loss) is proposed. C-SupConGAN enhances the conditional contrastive loss (2C loss) of the Contrastive GAN (ContraGAN) that considers data-to-data relationships and data-to-class relationships in the discriminator. The audio and image embeddings extracted from the encoder pre-trained using CMCRL is used to further extend the C-SupCon loss. The extended C-SupCon loss additionally considers relations information between data embedding and the corresponding audio embedding (data-to-source relationships) or between data embedding and the corresponding image embedding (data-to-target relationships). Extensive experiments show that the proposed method improved performance, generates higher quality images for audio-to-image generation than previous research, and effectively alleviates the training collapse of GAN.

查看原文本刊更多论文

C-SupConGAN:使用对比学习和训练数据特征生成音频到图像

本文研究了音频到图像的生成问题，即从音频输入生成合适的图像。之前的一项研究，跨模态对比表示学习(CMCRL)，使用音频和图像进行训练，以提取有用的音频特征，用于音频到图像的生成。CMCRL对生成对抗网络(GAN)进行了升级，在生成学习阶段实现了高性能，但GAN存在训练不稳定性。本文提出了一种使用条件监督对比损失(C-SupCon loss)的C-SupConGAN。C-SupConGAN增强了判别器中考虑数据对数据关系和数据对类关系的对比GAN (ContraGAN)的条件对比损耗(2C损耗)。使用CMCRL预训练的编码器中提取的音频和图像嵌入用于进一步扩展C-SupCon损失。扩展的C-SupCon损失还考虑了数据嵌入与相应的音频嵌入(数据到源关系)或数据嵌入与相应的图像嵌入(数据到目标关系)之间的关系信息。大量的实验表明，该方法提高了性能，生成的图像质量比以往的研究更高，有效地缓解了GAN的训练崩溃。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 2022 5th Artificial Intelligence and Cloud Computing Conference

自引率

0.00%

发文量