{"title":"CcGL-GAN: Criss-Cross Attention and Global-Local Discriminator Generative Adversarial Networks for text-to-image synthesis","authors":"Xihong Ye, Luanhao Lu","doi":"10.1109/IJCNN52387.2021.9533396","DOIUrl":null,"url":null,"abstract":"Text-to-image synthesis aims to generate a visually realistic image according to a linguistic text description. Visual quality and semantic consistency are two key objectives. Although remarkable progress has been made in improving visual resolutions leveraging Generative Adversarial Networks (GANs), guaranteeing the semantic conformity remains challenging. In this paper, we address it by proposing a novel Criss-Cross Attention and Global-Local Discriminator Generative Adversarial Networks(CcGL-GAN). CcGL-GAN exploits a Criss-Cross Attention mechanism to capture the variation of contextual description, which enables back generators to generate images more efficiently. Moreover, it utilizes Global-Local discriminators to project low-resolution images onto global linguistic representations, and high-resolution images onto local linguistic representations, which ensures that our model narrows the gap between images and descriptions. Experiments conducted on two publicly available datasets, the CUB and Oxford-102, demonstrate the effectiveness of the proposed CcGL-GAN model.","PeriodicalId":396583,"journal":{"name":"2021 International Joint Conference on Neural Networks (IJCNN)","volume":"60 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 International Joint Conference on Neural Networks (IJCNN)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IJCNN52387.2021.9533396","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Text-to-image synthesis aims to generate a visually realistic image according to a linguistic text description. Visual quality and semantic consistency are two key objectives. Although remarkable progress has been made in improving visual resolutions leveraging Generative Adversarial Networks (GANs), guaranteeing the semantic conformity remains challenging. In this paper, we address it by proposing a novel Criss-Cross Attention and Global-Local Discriminator Generative Adversarial Networks(CcGL-GAN). CcGL-GAN exploits a Criss-Cross Attention mechanism to capture the variation of contextual description, which enables back generators to generate images more efficiently. Moreover, it utilizes Global-Local discriminators to project low-resolution images onto global linguistic representations, and high-resolution images onto local linguistic representations, which ensures that our model narrows the gap between images and descriptions. Experiments conducted on two publicly available datasets, the CUB and Oxford-102, demonstrate the effectiveness of the proposed CcGL-GAN model.