{"title":"CcGL-GAN:用于文本到图像合成的交叉注意和全局-局部鉴别器生成对抗网络","authors":"Xihong Ye, Luanhao Lu","doi":"10.1109/IJCNN52387.2021.9533396","DOIUrl":null,"url":null,"abstract":"Text-to-image synthesis aims to generate a visually realistic image according to a linguistic text description. Visual quality and semantic consistency are two key objectives. Although remarkable progress has been made in improving visual resolutions leveraging Generative Adversarial Networks (GANs), guaranteeing the semantic conformity remains challenging. In this paper, we address it by proposing a novel Criss-Cross Attention and Global-Local Discriminator Generative Adversarial Networks(CcGL-GAN). CcGL-GAN exploits a Criss-Cross Attention mechanism to capture the variation of contextual description, which enables back generators to generate images more efficiently. Moreover, it utilizes Global-Local discriminators to project low-resolution images onto global linguistic representations, and high-resolution images onto local linguistic representations, which ensures that our model narrows the gap between images and descriptions. Experiments conducted on two publicly available datasets, the CUB and Oxford-102, demonstrate the effectiveness of the proposed CcGL-GAN model.","PeriodicalId":396583,"journal":{"name":"2021 International Joint Conference on Neural Networks (IJCNN)","volume":"60 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"CcGL-GAN: Criss-Cross Attention and Global-Local Discriminator Generative Adversarial Networks for text-to-image synthesis\",\"authors\":\"Xihong Ye, Luanhao Lu\",\"doi\":\"10.1109/IJCNN52387.2021.9533396\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Text-to-image synthesis aims to generate a visually realistic image according to a linguistic text description. Visual quality and semantic consistency are two key objectives. Although remarkable progress has been made in improving visual resolutions leveraging Generative Adversarial Networks (GANs), guaranteeing the semantic conformity remains challenging. In this paper, we address it by proposing a novel Criss-Cross Attention and Global-Local Discriminator Generative Adversarial Networks(CcGL-GAN). CcGL-GAN exploits a Criss-Cross Attention mechanism to capture the variation of contextual description, which enables back generators to generate images more efficiently. Moreover, it utilizes Global-Local discriminators to project low-resolution images onto global linguistic representations, and high-resolution images onto local linguistic representations, which ensures that our model narrows the gap between images and descriptions. Experiments conducted on two publicly available datasets, the CUB and Oxford-102, demonstrate the effectiveness of the proposed CcGL-GAN model.\",\"PeriodicalId\":396583,\"journal\":{\"name\":\"2021 International Joint Conference on Neural Networks (IJCNN)\",\"volume\":\"60 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-07-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 International Joint Conference on Neural Networks (IJCNN)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IJCNN52387.2021.9533396\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 International Joint Conference on Neural Networks (IJCNN)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IJCNN52387.2021.9533396","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
CcGL-GAN: Criss-Cross Attention and Global-Local Discriminator Generative Adversarial Networks for text-to-image synthesis
Text-to-image synthesis aims to generate a visually realistic image according to a linguistic text description. Visual quality and semantic consistency are two key objectives. Although remarkable progress has been made in improving visual resolutions leveraging Generative Adversarial Networks (GANs), guaranteeing the semantic conformity remains challenging. In this paper, we address it by proposing a novel Criss-Cross Attention and Global-Local Discriminator Generative Adversarial Networks(CcGL-GAN). CcGL-GAN exploits a Criss-Cross Attention mechanism to capture the variation of contextual description, which enables back generators to generate images more efficiently. Moreover, it utilizes Global-Local discriminators to project low-resolution images onto global linguistic representations, and high-resolution images onto local linguistic representations, which ensures that our model narrows the gap between images and descriptions. Experiments conducted on two publicly available datasets, the CUB and Oxford-102, demonstrate the effectiveness of the proposed CcGL-GAN model.