自适应遗忘，起草和综合指导:文本到图像的合成与层次生成对抗网络

Embedded Systems and Applications Pub Date : 2022-03-26 DOI:10.5121/csit.2022.120623

Yuting Xue, Heng Zhou, Yuxuan Ding, Xiao Shan

{"title":"自适应遗忘，起草和综合指导:文本到图像的合成与层次生成对抗网络","authors":"Yuting Xue, Heng Zhou, Yuxuan Ding, Xiao Shan","doi":"10.5121/csit.2022.120623","DOIUrl":null,"url":null,"abstract":"The generation task from text to image generates cross modal data with consistent content by mining the semantic consistency contained in two different modal information of text and image. Due to the differences between the two modes, the task of text to image generation faces many difficulties and challenges. In this paper, we propose to boost the text-to-image synthesis through an adaptive learning and generating generative adversarial networks (ALG-GANs). First, we propose an adaptive forgetting mechanism in the generator to reduce the error accumulation and learn knowledge flexibly in the cascade structure. Besides, to evade the mode collapse caused by a strong biased surveillance, we propose a multi-task discriminator using weaksupervision information to guide the generator more comprehensively and maintain the semantic consistency in the cascade generation process. To avoid the refine difficulty aroused by the bad initialization, we judge the quality of initialization before further processing. The generator will re-sample the noise and re-initialize the bad initializations to obtain good ones. All the above contributions have been integrated in a unified framework, which is an adaptive forgetting, drafting and comprehensive guiding based text-to-image synthesis method with hierarchical generative adversarial networks. The model is evaluated on the Caltech-UCSD Birds 200 (CUB) dataset and the Oxford 102 Category Flowers (Oxford) dataset with standard metrics. The results on Inception Score (IS) and Fréchet Inception Distance (FID) show that our model outperforms the previous methods.","PeriodicalId":201778,"journal":{"name":"Embedded Systems and Applications","volume":"73 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Adaptive Forgetting, Drafting and Comprehensive Guiding: Text-to-Image Synthesis with Hierarchical Generative Adversarial Networks\",\"authors\":\"Yuting Xue, Heng Zhou, Yuxuan Ding, Xiao Shan\",\"doi\":\"10.5121/csit.2022.120623\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The generation task from text to image generates cross modal data with consistent content by mining the semantic consistency contained in two different modal information of text and image. Due to the differences between the two modes, the task of text to image generation faces many difficulties and challenges. In this paper, we propose to boost the text-to-image synthesis through an adaptive learning and generating generative adversarial networks (ALG-GANs). First, we propose an adaptive forgetting mechanism in the generator to reduce the error accumulation and learn knowledge flexibly in the cascade structure. Besides, to evade the mode collapse caused by a strong biased surveillance, we propose a multi-task discriminator using weaksupervision information to guide the generator more comprehensively and maintain the semantic consistency in the cascade generation process. To avoid the refine difficulty aroused by the bad initialization, we judge the quality of initialization before further processing. The generator will re-sample the noise and re-initialize the bad initializations to obtain good ones. All the above contributions have been integrated in a unified framework, which is an adaptive forgetting, drafting and comprehensive guiding based text-to-image synthesis method with hierarchical generative adversarial networks. The model is evaluated on the Caltech-UCSD Birds 200 (CUB) dataset and the Oxford 102 Category Flowers (Oxford) dataset with standard metrics. The results on Inception Score (IS) and Fréchet Inception Distance (FID) show that our model outperforms the previous methods.\",\"PeriodicalId\":201778,\"journal\":{\"name\":\"Embedded Systems and Applications\",\"volume\":\"73 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-03-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Embedded Systems and Applications\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.5121/csit.2022.120623\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Embedded Systems and Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5121/csit.2022.120623","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

文本到图像的生成任务通过挖掘文本和图像两种不同模态信息中包含的语义一致性，生成内容一致的跨模态数据。由于两种模式的差异，文本到图像的生成任务面临着许多困难和挑战。在本文中，我们提出通过自适应学习和生成生成对抗网络(algans)来促进文本到图像的合成。首先，我们在生成器中提出了自适应遗忘机制，以减少串级结构中的错误积累，灵活地学习知识。此外，为了避免强偏差监视导致的模式崩溃，我们提出了一种利用弱监督信息的多任务鉴别器，在级联生成过程中更全面地引导生成器并保持语义一致性。为了避免初始化不良带来的细化困难，我们在进一步加工前对初始化质量进行了判断。生成器将重新采样噪声并重新初始化不良初始化以获得良好的初始化。将上述贡献整合到一个统一的框架中，即基于自适应遗忘、起草和综合指导的文本图像合成方法。该模型在Caltech-UCSD Birds 200 (CUB)数据集和Oxford 102 Category Flowers (Oxford)数据集上进行了标准指标的评估。在Inception Score (IS)和fr Inception Distance (FID)上的结果表明，我们的模型优于以前的方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Adaptive Forgetting, Drafting and Comprehensive Guiding: Text-to-Image Synthesis with Hierarchical Generative Adversarial Networks

The generation task from text to image generates cross modal data with consistent content by mining the semantic consistency contained in two different modal information of text and image. Due to the differences between the two modes, the task of text to image generation faces many difficulties and challenges. In this paper, we propose to boost the text-to-image synthesis through an adaptive learning and generating generative adversarial networks (ALG-GANs). First, we propose an adaptive forgetting mechanism in the generator to reduce the error accumulation and learn knowledge flexibly in the cascade structure. Besides, to evade the mode collapse caused by a strong biased surveillance, we propose a multi-task discriminator using weaksupervision information to guide the generator more comprehensively and maintain the semantic consistency in the cascade generation process. To avoid the refine difficulty aroused by the bad initialization, we judge the quality of initialization before further processing. The generator will re-sample the noise and re-initialize the bad initializations to obtain good ones. All the above contributions have been integrated in a unified framework, which is an adaptive forgetting, drafting and comprehensive guiding based text-to-image synthesis method with hierarchical generative adversarial networks. The model is evaluated on the Caltech-UCSD Birds 200 (CUB) dataset and the Oxford 102 Category Flowers (Oxford) dataset with standard metrics. The results on Inception Score (IS) and Fréchet Inception Distance (FID) show that our model outperforms the previous methods.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Embedded Systems and Applications

自引率

0.00%

发文量