多阶段混合文本到图像生成模型。

International Journal of Intelligent Computing and Information Sciences Pub Date : 2022-08-01 DOI:10.21608/ijicis.2022.117124.1157

Razan Bayoumi, Marco Alfonse, Abdel-Badeeh M. Salem

{"title":"多阶段混合文本到图像生成模型。","authors":"Razan Bayoumi, Marco Alfonse, Abdel-Badeeh M. Salem","doi":"10.21608/ijicis.2022.117124.1157","DOIUrl":null,"url":null,"abstract":"Generative Adversarial Networks (GANs) have proven their outstanding potential in creating realistic images that can't differentiate between them and the real images, but text-to-image (conditional generation) still faces some challenges. In this paper, we propose a new model called (AttnDM GAN) stands for Attentional Dynamic Memory Generative Adversarial Memory, which seeks to generate realistic output semantically harmonious with an input text description. AttnDM GAN is a three-stage hybrid model of the Attentional Generative Adversarial Network (AttnGAN) and the Dynamic Memory Generative Adversarial Network (DM-GAN), the 1 st stage is called the Initial Image Generation, in which low resolution 64x64 images are generated conditioned on the encoded input textual description. The 2 nd stage is the Attention Image Generation stage that generates higher-resolution images 128x128, and the last stage is Dynamic Memory Based Image Refinement that refines the images to 256x256 resolution images. We conduct an experiment on our model the AttnDM GAN using the Caltech-UCSD Birds 200 dataset and evaluate it using the Frechet Inception Distance (FID) with a value of 19.78. We also proposed another model called Dynamic Memory Attention Generative Adversarial Networks (DMAttn-GAN) which considered a variation of the AttnDM GAN model, where the second and the third stages are switched together, its FID value is 17.04.","PeriodicalId":244591,"journal":{"name":"International Journal of Intelligent Computing and Information Sciences","volume":"4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Multi-Stage Hybrid Text-to-Image Generation Models.\",\"authors\":\"Razan Bayoumi, Marco Alfonse, Abdel-Badeeh M. Salem\",\"doi\":\"10.21608/ijicis.2022.117124.1157\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Generative Adversarial Networks (GANs) have proven their outstanding potential in creating realistic images that can't differentiate between them and the real images, but text-to-image (conditional generation) still faces some challenges. In this paper, we propose a new model called (AttnDM GAN) stands for Attentional Dynamic Memory Generative Adversarial Memory, which seeks to generate realistic output semantically harmonious with an input text description. AttnDM GAN is a three-stage hybrid model of the Attentional Generative Adversarial Network (AttnGAN) and the Dynamic Memory Generative Adversarial Network (DM-GAN), the 1 st stage is called the Initial Image Generation, in which low resolution 64x64 images are generated conditioned on the encoded input textual description. The 2 nd stage is the Attention Image Generation stage that generates higher-resolution images 128x128, and the last stage is Dynamic Memory Based Image Refinement that refines the images to 256x256 resolution images. We conduct an experiment on our model the AttnDM GAN using the Caltech-UCSD Birds 200 dataset and evaluate it using the Frechet Inception Distance (FID) with a value of 19.78. We also proposed another model called Dynamic Memory Attention Generative Adversarial Networks (DMAttn-GAN) which considered a variation of the AttnDM GAN model, where the second and the third stages are switched together, its FID value is 17.04.\",\"PeriodicalId\":244591,\"journal\":{\"name\":\"International Journal of Intelligent Computing and Information Sciences\",\"volume\":\"4 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Intelligent Computing and Information Sciences\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.21608/ijicis.2022.117124.1157\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Intelligent Computing and Information Sciences","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.21608/ijicis.2022.117124.1157","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

生成对抗网络(GANs)已经证明了其在创建逼真图像方面的突出潜力，但文本到图像(条件生成)仍然面临一些挑战。在本文中，我们提出了一个新的模型(AttnDM GAN)，即注意动态记忆生成对抗记忆(attention Dynamic Memory Generative Adversarial Memory)，该模型旨在生成与输入文本描述语义协调的真实输出。AttnDM GAN是注意生成对抗网络(attention Generative Adversarial Network, AttnGAN)和动态记忆生成对抗网络(Dynamic Memory Generative Adversarial Network, DM-GAN)的三阶段混合模型，第一阶段称为初始图像生成，在此阶段，根据编码的输入文本描述生成低分辨率64x64图像。第二阶段是注意力图像生成阶段，生成更高分辨率的图像128x128，最后阶段是基于动态记忆的图像细化，将图像细化到256x256分辨率的图像。我们使用Caltech-UCSD Birds 200数据集对我们的AttnDM GAN模型进行了实验，并使用值为19.78的Frechet Inception Distance (FID)对其进行了评估。我们还提出了另一个称为动态记忆注意生成对抗网络(dmatn -GAN)的模型，该模型考虑了AttnDM GAN模型的变体，其中第二阶段和第三阶段被交换在一起，其FID值为17.04。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Multi-Stage Hybrid Text-to-Image Generation Models.

Generative Adversarial Networks (GANs) have proven their outstanding potential in creating realistic images that can't differentiate between them and the real images, but text-to-image (conditional generation) still faces some challenges. In this paper, we propose a new model called (AttnDM GAN) stands for Attentional Dynamic Memory Generative Adversarial Memory, which seeks to generate realistic output semantically harmonious with an input text description. AttnDM GAN is a three-stage hybrid model of the Attentional Generative Adversarial Network (AttnGAN) and the Dynamic Memory Generative Adversarial Network (DM-GAN), the 1 st stage is called the Initial Image Generation, in which low resolution 64x64 images are generated conditioned on the encoded input textual description. The 2 nd stage is the Attention Image Generation stage that generates higher-resolution images 128x128, and the last stage is Dynamic Memory Based Image Refinement that refines the images to 256x256 resolution images. We conduct an experiment on our model the AttnDM GAN using the Caltech-UCSD Birds 200 dataset and evaluate it using the Frechet Inception Distance (FID) with a value of 19.78. We also proposed another model called Dynamic Memory Attention Generative Adversarial Networks (DMAttn-GAN) which considered a variation of the AttnDM GAN model, where the second and the third stages are switched together, its FID value is 17.04.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

International Journal of Intelligent Computing and Information Sciences

自引率

0.00%

发文量