具有可学习辅助模块的生成式对抗网络用于图像合成

IF 6 3区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

ACM Transactions on Multimedia Computing Communications and Applications Pub Date : 2024-03-17 DOI:10.1145/3653021

Yan Gan, Chenxue Yang, Mao Ye, Renjie Huang, Deqiang Ouyang

{"title":"具有可学习辅助模块的生成式对抗网络用于图像合成","authors":"Yan Gan, Chenxue Yang, Mao Ye, Renjie Huang, Deqiang Ouyang","doi":"10.1145/3653021","DOIUrl":null,"url":null,"abstract":"<p>Training generative adversarial networks (GANs) for noise-to-image synthesis is a challenge task, primarily due to the instability of GANs’ training process. One of the key issues is the generator’s sensitivity to input data, which can cause sudden fluctuations in the generator’s loss value with certain inputs. This sensitivity suggests an inadequate ability to resist disturbances in the generator, causing the discriminator’s loss value to oscillate and negatively impacting the discriminator. Then, the negative feedback of discriminator is also not conducive to updating generator’s parameters, leading to suboptimal image generation quality. In response to this challenge, we present an innovative GANs model equipped with a learnable auxiliary module that processes auxiliary noise. The core objective of this module is to enhance the stability of both the generator and discriminator throughout the training process. To achieve this target, we incorporate a learnable auxiliary penalty and an augmented discriminator, designed to control the generator and reinforce the discriminator’s stability, respectively. We further apply our method to the Hinge and LSGANs loss functions, illustrating its efficacy in reducing the instability of both the generator and the discriminator. The tests we conducted on LSUN, CelebA, Market-1501 and Creative Senz3D datasets serve as proof of our method’s ability to improve the training stability and overall performance of the baseline methods.</p>","PeriodicalId":50937,"journal":{"name":"ACM Transactions on Multimedia Computing Communications and Applications","volume":"26 1","pages":""},"PeriodicalIF":6.0000,"publicationDate":"2024-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Generative Adversarial Networks with Learnable Auxiliary Module for Image Synthesis\",\"authors\":\"Yan Gan, Chenxue Yang, Mao Ye, Renjie Huang, Deqiang Ouyang\",\"doi\":\"10.1145/3653021\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Training generative adversarial networks (GANs) for noise-to-image synthesis is a challenge task, primarily due to the instability of GANs’ training process. One of the key issues is the generator’s sensitivity to input data, which can cause sudden fluctuations in the generator’s loss value with certain inputs. This sensitivity suggests an inadequate ability to resist disturbances in the generator, causing the discriminator’s loss value to oscillate and negatively impacting the discriminator. Then, the negative feedback of discriminator is also not conducive to updating generator’s parameters, leading to suboptimal image generation quality. In response to this challenge, we present an innovative GANs model equipped with a learnable auxiliary module that processes auxiliary noise. The core objective of this module is to enhance the stability of both the generator and discriminator throughout the training process. To achieve this target, we incorporate a learnable auxiliary penalty and an augmented discriminator, designed to control the generator and reinforce the discriminator’s stability, respectively. We further apply our method to the Hinge and LSGANs loss functions, illustrating its efficacy in reducing the instability of both the generator and the discriminator. The tests we conducted on LSUN, CelebA, Market-1501 and Creative Senz3D datasets serve as proof of our method’s ability to improve the training stability and overall performance of the baseline methods.</p>\",\"PeriodicalId\":50937,\"journal\":{\"name\":\"ACM Transactions on Multimedia Computing Communications and Applications\",\"volume\":\"26 1\",\"pages\":\"\"},\"PeriodicalIF\":6.0000,\"publicationDate\":\"2024-03-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACM Transactions on Multimedia Computing Communications and Applications\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1145/3653021\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Multimedia Computing Communications and Applications","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1145/3653021","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

摘要

训练生成式对抗网络（GAN）进行噪声-图像合成是一项具有挑战性的任务，这主要是由于生成式对抗网络的训练过程具有不稳定性。其中一个关键问题是生成器对输入数据的敏感性，在某些输入情况下，生成器的损失值会突然波动。这种敏感性表明生成器抵御干扰的能力不足，从而导致鉴别器的损耗值波动，对鉴别器产生负面影响。然后，鉴别器的负反馈也不利于更新发生器的参数，导致图像生成质量不理想。为了应对这一挑战，我们提出了一种创新的 GANs 模型，该模型配备了一个可学习的辅助模块，用于处理辅助噪声。该模块的核心目标是在整个训练过程中增强生成器和鉴别器的稳定性。为了实现这一目标，我们加入了可学习辅助惩罚和增强判别器，分别用于控制生成器和增强判别器的稳定性。我们进一步将我们的方法应用于 Hinge 和 LSGANs 损失函数，说明它在降低生成器和判别器的不稳定性方面的功效。我们在 LSUN、CelebA、Market-1501 和 Creative Senz3D 数据集上进行的测试证明了我们的方法能够提高训练稳定性和基线方法的整体性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Generative Adversarial Networks with Learnable Auxiliary Module for Image Synthesis

Training generative adversarial networks (GANs) for noise-to-image synthesis is a challenge task, primarily due to the instability of GANs’ training process. One of the key issues is the generator’s sensitivity to input data, which can cause sudden fluctuations in the generator’s loss value with certain inputs. This sensitivity suggests an inadequate ability to resist disturbances in the generator, causing the discriminator’s loss value to oscillate and negatively impacting the discriminator. Then, the negative feedback of discriminator is also not conducive to updating generator’s parameters, leading to suboptimal image generation quality. In response to this challenge, we present an innovative GANs model equipped with a learnable auxiliary module that processes auxiliary noise. The core objective of this module is to enhance the stability of both the generator and discriminator throughout the training process. To achieve this target, we incorporate a learnable auxiliary penalty and an augmented discriminator, designed to control the generator and reinforce the discriminator’s stability, respectively. We further apply our method to the Hinge and LSGANs loss functions, illustrating its efficacy in reducing the instability of both the generator and the discriminator. The tests we conducted on LSUN, CelebA, Market-1501 and Creative Senz3D datasets serve as proof of our method’s ability to improve the training stability and overall performance of the baseline methods.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

ACM Transactions on Multimedia Computing Communications and Applications 工程技术-计算机：理论方法

CiteScore

8.50

自引率

5.90%

发文量

285

审稿时长

7.5 months

期刊介绍： The ACM Transactions on Multimedia Computing, Communications, and Applications is the flagship publication of the ACM Special Interest Group in Multimedia (SIGMM). It is soliciting paper submissions on all aspects of multimedia. Papers on single media (for instance, audio, video, animation) and their processing are also welcome. TOMM is a peer-reviewed, archival journal, available in both print form and digital form. The Journal is published quarterly; with roughly 7 23-page articles in each issue. In addition, all Special Issues are published online-only to ensure a timely publication. The transactions consists primarily of research papers. This is an archival journal and it is intended that the papers will have lasting importance and value over time. In general, papers whose primary focus is on particular multimedia products or the current state of the industry will not be included.