CcGL-GAN:用于文本到图像合成的交叉注意和全局-局部鉴别器生成对抗网络

2021 International Joint Conference on Neural Networks (IJCNN) Pub Date : 2021-07-18 DOI:10.1109/IJCNN52387.2021.9533396

Xihong Ye, Luanhao Lu

{"title":"CcGL-GAN:用于文本到图像合成的交叉注意和全局-局部鉴别器生成对抗网络","authors":"Xihong Ye, Luanhao Lu","doi":"10.1109/IJCNN52387.2021.9533396","DOIUrl":null,"url":null,"abstract":"Text-to-image synthesis aims to generate a visually realistic image according to a linguistic text description. Visual quality and semantic consistency are two key objectives. Although remarkable progress has been made in improving visual resolutions leveraging Generative Adversarial Networks (GANs), guaranteeing the semantic conformity remains challenging. In this paper, we address it by proposing a novel Criss-Cross Attention and Global-Local Discriminator Generative Adversarial Networks(CcGL-GAN). CcGL-GAN exploits a Criss-Cross Attention mechanism to capture the variation of contextual description, which enables back generators to generate images more efficiently. Moreover, it utilizes Global-Local discriminators to project low-resolution images onto global linguistic representations, and high-resolution images onto local linguistic representations, which ensures that our model narrows the gap between images and descriptions. Experiments conducted on two publicly available datasets, the CUB and Oxford-102, demonstrate the effectiveness of the proposed CcGL-GAN model.","PeriodicalId":396583,"journal":{"name":"2021 International Joint Conference on Neural Networks (IJCNN)","volume":"60 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"CcGL-GAN: Criss-Cross Attention and Global-Local Discriminator Generative Adversarial Networks for text-to-image synthesis\",\"authors\":\"Xihong Ye, Luanhao Lu\",\"doi\":\"10.1109/IJCNN52387.2021.9533396\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Text-to-image synthesis aims to generate a visually realistic image according to a linguistic text description. Visual quality and semantic consistency are two key objectives. Although remarkable progress has been made in improving visual resolutions leveraging Generative Adversarial Networks (GANs), guaranteeing the semantic conformity remains challenging. In this paper, we address it by proposing a novel Criss-Cross Attention and Global-Local Discriminator Generative Adversarial Networks(CcGL-GAN). CcGL-GAN exploits a Criss-Cross Attention mechanism to capture the variation of contextual description, which enables back generators to generate images more efficiently. Moreover, it utilizes Global-Local discriminators to project low-resolution images onto global linguistic representations, and high-resolution images onto local linguistic representations, which ensures that our model narrows the gap between images and descriptions. Experiments conducted on two publicly available datasets, the CUB and Oxford-102, demonstrate the effectiveness of the proposed CcGL-GAN model.\",\"PeriodicalId\":396583,\"journal\":{\"name\":\"2021 International Joint Conference on Neural Networks (IJCNN)\",\"volume\":\"60 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-07-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 International Joint Conference on Neural Networks (IJCNN)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IJCNN52387.2021.9533396\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 International Joint Conference on Neural Networks (IJCNN)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IJCNN52387.2021.9533396","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

文本-图像合成的目的是根据语言文本描述生成视觉逼真的图像。视觉质量和语义一致性是两个关键目标。尽管利用生成对抗网络(gan)在提高视觉分辨率方面取得了显著进展，但保证语义一致性仍然具有挑战性。在本文中，我们通过提出一种新的交叉注意和全局-局部判别器生成对抗网络(CcGL-GAN)来解决这个问题。CcGL-GAN利用一种交叉注意机制来捕捉上下文描述的变化，从而使反向生成器能够更有效地生成图像。此外，它利用全局-局部判别器将低分辨率图像投影到全局语言表示中，将高分辨率图像投影到局部语言表示中，从而确保我们的模型缩小了图像和描述之间的差距。在两个公开可用的数据集CUB和Oxford-102上进行的实验证明了所提出的CcGL-GAN模型的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

CcGL-GAN: Criss-Cross Attention and Global-Local Discriminator Generative Adversarial Networks for text-to-image synthesis

Text-to-image synthesis aims to generate a visually realistic image according to a linguistic text description. Visual quality and semantic consistency are two key objectives. Although remarkable progress has been made in improving visual resolutions leveraging Generative Adversarial Networks (GANs), guaranteeing the semantic conformity remains challenging. In this paper, we address it by proposing a novel Criss-Cross Attention and Global-Local Discriminator Generative Adversarial Networks(CcGL-GAN). CcGL-GAN exploits a Criss-Cross Attention mechanism to capture the variation of contextual description, which enables back generators to generate images more efficiently. Moreover, it utilizes Global-Local discriminators to project low-resolution images onto global linguistic representations, and high-resolution images onto local linguistic representations, which ensures that our model narrows the gap between images and descriptions. Experiments conducted on two publicly available datasets, the CUB and Oxford-102, demonstrate the effectiveness of the proposed CcGL-GAN model.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2021 International Joint Conference on Neural Networks (IJCNN)

自引率

0.00%

发文量