Razan Bayoumi, Marco Alfonse, Abdel-Badeeh M. Salem
{"title":"多阶段混合文本到图像生成模型。","authors":"Razan Bayoumi, Marco Alfonse, Abdel-Badeeh M. Salem","doi":"10.21608/ijicis.2022.117124.1157","DOIUrl":null,"url":null,"abstract":"Generative Adversarial Networks (GANs) have proven their outstanding potential in creating realistic images that can't differentiate between them and the real images, but text-to-image (conditional generation) still faces some challenges. In this paper, we propose a new model called (AttnDM GAN) stands for Attentional Dynamic Memory Generative Adversarial Memory, which seeks to generate realistic output semantically harmonious with an input text description. AttnDM GAN is a three-stage hybrid model of the Attentional Generative Adversarial Network (AttnGAN) and the Dynamic Memory Generative Adversarial Network (DM-GAN), the 1 st stage is called the Initial Image Generation, in which low resolution 64x64 images are generated conditioned on the encoded input textual description. The 2 nd stage is the Attention Image Generation stage that generates higher-resolution images 128x128, and the last stage is Dynamic Memory Based Image Refinement that refines the images to 256x256 resolution images. We conduct an experiment on our model the AttnDM GAN using the Caltech-UCSD Birds 200 dataset and evaluate it using the Frechet Inception Distance (FID) with a value of 19.78. We also proposed another model called Dynamic Memory Attention Generative Adversarial Networks (DMAttn-GAN) which considered a variation of the AttnDM GAN model, where the second and the third stages are switched together, its FID value is 17.04.","PeriodicalId":244591,"journal":{"name":"International Journal of Intelligent Computing and Information Sciences","volume":"4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Multi-Stage Hybrid Text-to-Image Generation Models.\",\"authors\":\"Razan Bayoumi, Marco Alfonse, Abdel-Badeeh M. Salem\",\"doi\":\"10.21608/ijicis.2022.117124.1157\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Generative Adversarial Networks (GANs) have proven their outstanding potential in creating realistic images that can't differentiate between them and the real images, but text-to-image (conditional generation) still faces some challenges. In this paper, we propose a new model called (AttnDM GAN) stands for Attentional Dynamic Memory Generative Adversarial Memory, which seeks to generate realistic output semantically harmonious with an input text description. AttnDM GAN is a three-stage hybrid model of the Attentional Generative Adversarial Network (AttnGAN) and the Dynamic Memory Generative Adversarial Network (DM-GAN), the 1 st stage is called the Initial Image Generation, in which low resolution 64x64 images are generated conditioned on the encoded input textual description. The 2 nd stage is the Attention Image Generation stage that generates higher-resolution images 128x128, and the last stage is Dynamic Memory Based Image Refinement that refines the images to 256x256 resolution images. We conduct an experiment on our model the AttnDM GAN using the Caltech-UCSD Birds 200 dataset and evaluate it using the Frechet Inception Distance (FID) with a value of 19.78. We also proposed another model called Dynamic Memory Attention Generative Adversarial Networks (DMAttn-GAN) which considered a variation of the AttnDM GAN model, where the second and the third stages are switched together, its FID value is 17.04.\",\"PeriodicalId\":244591,\"journal\":{\"name\":\"International Journal of Intelligent Computing and Information Sciences\",\"volume\":\"4 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Intelligent Computing and Information Sciences\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.21608/ijicis.2022.117124.1157\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Intelligent Computing and Information Sciences","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.21608/ijicis.2022.117124.1157","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Generative Adversarial Networks (GANs) have proven their outstanding potential in creating realistic images that can't differentiate between them and the real images, but text-to-image (conditional generation) still faces some challenges. In this paper, we propose a new model called (AttnDM GAN) stands for Attentional Dynamic Memory Generative Adversarial Memory, which seeks to generate realistic output semantically harmonious with an input text description. AttnDM GAN is a three-stage hybrid model of the Attentional Generative Adversarial Network (AttnGAN) and the Dynamic Memory Generative Adversarial Network (DM-GAN), the 1 st stage is called the Initial Image Generation, in which low resolution 64x64 images are generated conditioned on the encoded input textual description. The 2 nd stage is the Attention Image Generation stage that generates higher-resolution images 128x128, and the last stage is Dynamic Memory Based Image Refinement that refines the images to 256x256 resolution images. We conduct an experiment on our model the AttnDM GAN using the Caltech-UCSD Birds 200 dataset and evaluate it using the Frechet Inception Distance (FID) with a value of 19.78. We also proposed another model called Dynamic Memory Attention Generative Adversarial Networks (DMAttn-GAN) which considered a variation of the AttnDM GAN model, where the second and the third stages are switched together, its FID value is 17.04.