C-SupConGAN:使用对比学习和训练数据特征生成音频到图像

Proceedings of the 2022 5th Artificial Intelligence and Cloud Computing Conference Pub Date : 2022-12-17 DOI:10.1145/3582099.3582121

Haechun Chung, Jong-Kook Kim

{"title":"C-SupConGAN:使用对比学习和训练数据特征生成音频到图像","authors":"Haechun Chung, Jong-Kook Kim","doi":"10.1145/3582099.3582121","DOIUrl":null,"url":null,"abstract":"In this paper, the audio-to-image generation problem is investigated, where appropriate images are generated from the audio input. A previous study, Cross-Modal Contrastive Representation Learning (CMCRL), trained using both audios and images to extract useful audio features for audio-to-image generation. The CMCRL upgraded the Generative Adversarial Networks (GAN) to achieve high performance in the generation learning phase, but the GAN showed training instability. In this paper, the C-SupConGAN that uses the conditional supervised contrastive loss (C-SupCon loss) is proposed. C-SupConGAN enhances the conditional contrastive loss (2C loss) of the Contrastive GAN (ContraGAN) that considers data-to-data relationships and data-to-class relationships in the discriminator. The audio and image embeddings extracted from the encoder pre-trained using CMCRL is used to further extend the C-SupCon loss. The extended C-SupCon loss additionally considers relations information between data embedding and the corresponding audio embedding (data-to-source relationships) or between data embedding and the corresponding image embedding (data-to-target relationships). Extensive experiments show that the proposed method improved performance, generates higher quality images for audio-to-image generation than previous research, and effectively alleviates the training collapse of GAN.","PeriodicalId":222372,"journal":{"name":"Proceedings of the 2022 5th Artificial Intelligence and Cloud Computing Conference","volume":"143 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"C-SupConGAN: Using Contrastive Learning and Trained Data Features for Audio-to-Image Generation\",\"authors\":\"Haechun Chung, Jong-Kook Kim\",\"doi\":\"10.1145/3582099.3582121\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, the audio-to-image generation problem is investigated, where appropriate images are generated from the audio input. A previous study, Cross-Modal Contrastive Representation Learning (CMCRL), trained using both audios and images to extract useful audio features for audio-to-image generation. The CMCRL upgraded the Generative Adversarial Networks (GAN) to achieve high performance in the generation learning phase, but the GAN showed training instability. In this paper, the C-SupConGAN that uses the conditional supervised contrastive loss (C-SupCon loss) is proposed. C-SupConGAN enhances the conditional contrastive loss (2C loss) of the Contrastive GAN (ContraGAN) that considers data-to-data relationships and data-to-class relationships in the discriminator. The audio and image embeddings extracted from the encoder pre-trained using CMCRL is used to further extend the C-SupCon loss. The extended C-SupCon loss additionally considers relations information between data embedding and the corresponding audio embedding (data-to-source relationships) or between data embedding and the corresponding image embedding (data-to-target relationships). Extensive experiments show that the proposed method improved performance, generates higher quality images for audio-to-image generation than previous research, and effectively alleviates the training collapse of GAN.\",\"PeriodicalId\":222372,\"journal\":{\"name\":\"Proceedings of the 2022 5th Artificial Intelligence and Cloud Computing Conference\",\"volume\":\"143 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-12-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2022 5th Artificial Intelligence and Cloud Computing Conference\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3582099.3582121\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2022 5th Artificial Intelligence and Cloud Computing Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3582099.3582121","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

本文研究了音频到图像的生成问题，即从音频输入生成合适的图像。之前的一项研究，跨模态对比表示学习(CMCRL)，使用音频和图像进行训练，以提取有用的音频特征，用于音频到图像的生成。CMCRL对生成对抗网络(GAN)进行了升级，在生成学习阶段实现了高性能，但GAN存在训练不稳定性。本文提出了一种使用条件监督对比损失(C-SupCon loss)的C-SupConGAN。C-SupConGAN增强了判别器中考虑数据对数据关系和数据对类关系的对比GAN (ContraGAN)的条件对比损耗(2C损耗)。使用CMCRL预训练的编码器中提取的音频和图像嵌入用于进一步扩展C-SupCon损失。扩展的C-SupCon损失还考虑了数据嵌入与相应的音频嵌入(数据到源关系)或数据嵌入与相应的图像嵌入(数据到目标关系)之间的关系信息。大量的实验表明，该方法提高了性能，生成的图像质量比以往的研究更高，有效地缓解了GAN的训练崩溃。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

C-SupConGAN: Using Contrastive Learning and Trained Data Features for Audio-to-Image Generation

In this paper, the audio-to-image generation problem is investigated, where appropriate images are generated from the audio input. A previous study, Cross-Modal Contrastive Representation Learning (CMCRL), trained using both audios and images to extract useful audio features for audio-to-image generation. The CMCRL upgraded the Generative Adversarial Networks (GAN) to achieve high performance in the generation learning phase, but the GAN showed training instability. In this paper, the C-SupConGAN that uses the conditional supervised contrastive loss (C-SupCon loss) is proposed. C-SupConGAN enhances the conditional contrastive loss (2C loss) of the Contrastive GAN (ContraGAN) that considers data-to-data relationships and data-to-class relationships in the discriminator. The audio and image embeddings extracted from the encoder pre-trained using CMCRL is used to further extend the C-SupCon loss. The extended C-SupCon loss additionally considers relations information between data embedding and the corresponding audio embedding (data-to-source relationships) or between data embedding and the corresponding image embedding (data-to-target relationships). Extensive experiments show that the proposed method improved performance, generates higher quality images for audio-to-image generation than previous research, and effectively alleviates the training collapse of GAN.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 2022 5th Artificial Intelligence and Cloud Computing Conference

自引率

0.00%

发文量