SemGAN: Text to Image Synthesis from Text Semantics using Attentional Generative Adversarial Networks

2020 International Conference on Computer, Control, Electrical, and Electronics Engineering (ICCCEEE) Pub Date : 2021-02-26 DOI:10.1109/ICCCEEE49695.2021.9429602

Ammar Nasr Abdallah Khairi, Ruba Mutasim, Hiba Imam

{"title":"SemGAN: Text to Image Synthesis from Text Semantics using Attentional Generative Adversarial Networks","authors":"Ammar Nasr Abdallah Khairi, Ruba Mutasim, Hiba Imam","doi":"10.1109/ICCCEEE49695.2021.9429602","DOIUrl":null,"url":null,"abstract":"Text to Image Synthesis is the procedure of automatically creating a realistic image from a particular text description. There are numerous innovative and practical applications for text to image synthesis, including image processing and compute-raided design. Using Generative Adversarial Networks (GANs) alongside the Attention mechanism has led to huge improvements lately. The fine-grained attention mechanism, although powerful, does not preserve the general description information well in the generator since it only attends to the text description at word-level (fine-grained). We propose incorporating the whole sentence semantics when generating images from captions to enhance the attention mechanism outputs. According to experiments, on our model produces more robust images with a better semantic layout. We use the Caltech birds dataset to run experiments on both models and validate the effectiveness of our proposal. Our model boosts the original AttnGAN Inception score by +4.13% and the Fréchet Inception Distance score by +13.93%. Moreover, an empirical analysis is carried out on the objective and subjective measures to: (i) address and overcome the limitations of these metrics (ii) verify that performance improvements are due to fundamental algorithmic changes rather than initialization and fine-tuning as with GANs models.","PeriodicalId":359802,"journal":{"name":"2020 International Conference on Computer, Control, Electrical, and Electronics Engineering (ICCCEEE)","volume":"54 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 International Conference on Computer, Control, Electrical, and Electronics Engineering (ICCCEEE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCCEEE49695.2021.9429602","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Text to Image Synthesis is the procedure of automatically creating a realistic image from a particular text description. There are numerous innovative and practical applications for text to image synthesis, including image processing and compute-raided design. Using Generative Adversarial Networks (GANs) alongside the Attention mechanism has led to huge improvements lately. The fine-grained attention mechanism, although powerful, does not preserve the general description information well in the generator since it only attends to the text description at word-level (fine-grained). We propose incorporating the whole sentence semantics when generating images from captions to enhance the attention mechanism outputs. According to experiments, on our model produces more robust images with a better semantic layout. We use the Caltech birds dataset to run experiments on both models and validate the effectiveness of our proposal. Our model boosts the original AttnGAN Inception score by +4.13% and the Fréchet Inception Distance score by +13.93%. Moreover, an empirical analysis is carried out on the objective and subjective measures to: (i) address and overcome the limitations of these metrics (ii) verify that performance improvements are due to fundamental algorithmic changes rather than initialization and fine-tuning as with GANs models.

查看原文本刊更多论文

SemGAN:使用注意力生成对抗网络从文本语义合成文本到图像

文本到图像合成是从特定文本描述自动创建逼真图像的过程。文本到图像的合成有许多创新和实际的应用，包括图像处理和计算机检索设计。最近，生成对抗网络(GANs)与注意力机制的结合带来了巨大的进步。细粒度注意机制虽然强大，但不能很好地保存生成器中的一般描述信息，因为它只关注单词级别(细粒度)的文本描述。我们建议在从字幕生成图像时加入整句语义，以增强注意机制的输出。实验结果表明，在我们的模型上生成的图像具有更好的语义布局，鲁棒性更好。我们使用加州理工学院的鸟类数据集对这两个模型进行实验，并验证我们建议的有效性。我们的模型将原始的AttnGAN Inception得分提高了+4.13%，将fr盗梦距离得分提高了+13.93%。此外，对客观和主观指标进行了实证分析，以:(i)解决并克服这些指标的局限性;(ii)验证性能改进是由于基本算法的改变，而不是像gan模型那样的初始化和微调。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2020 International Conference on Computer, Control, Electrical, and Electronics Engineering (ICCCEEE)

自引率

0.00%

发文量