{"title":"EMF-GAN:Efficient Multilayer Fusion GAN for text-to-image synthesis","authors":"Wenli Chen , Huihuang Zhao","doi":"10.1016/j.cag.2025.104219","DOIUrl":null,"url":null,"abstract":"<div><div>Text-to-image generation is a challenging and significant research task. It aims to synthesize high-quality images that match the given descriptive statements. Existing methods still have problems in generating semantic information fusion insufficiently, and the generated images cannot represent the descriptive statements properly. Therefore, A novel method named EMF-GAN (Efficient Multilayer Fusion Generative Adversarial Network) is proposed. It uses a Multilayer Fusion Module (MF Module) and Efficient Multi-Scale Attention Module (EMA Module) to fuse the semantic information into the feature maps gradually. It realizes the full utilization of the semantic information and obtains high-quality realistic images. Extensive experimental results show that our EMF-GAN is highly competitive in image generation quality and semantic consistency. Compared with the state-of-the-art methods, EMF-GAN shows significant performance improvement on both CUB (FID from 14.81 to 10.74) and COCO (FID from 19.32 to 16.86) datasets. It can generate photorealistic images with richer details and text-image consistency. Code can be found at <span><span>https://github.com/zxcnmmmmm/EMF-GAN-master</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50628,"journal":{"name":"Computers & Graphics-Uk","volume":"128 ","pages":"Article 104219"},"PeriodicalIF":2.5000,"publicationDate":"2025-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers & Graphics-Uk","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0097849325000603","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}
引用次数: 0
Abstract
Text-to-image generation is a challenging and significant research task. It aims to synthesize high-quality images that match the given descriptive statements. Existing methods still have problems in generating semantic information fusion insufficiently, and the generated images cannot represent the descriptive statements properly. Therefore, A novel method named EMF-GAN (Efficient Multilayer Fusion Generative Adversarial Network) is proposed. It uses a Multilayer Fusion Module (MF Module) and Efficient Multi-Scale Attention Module (EMA Module) to fuse the semantic information into the feature maps gradually. It realizes the full utilization of the semantic information and obtains high-quality realistic images. Extensive experimental results show that our EMF-GAN is highly competitive in image generation quality and semantic consistency. Compared with the state-of-the-art methods, EMF-GAN shows significant performance improvement on both CUB (FID from 14.81 to 10.74) and COCO (FID from 19.32 to 16.86) datasets. It can generate photorealistic images with richer details and text-image consistency. Code can be found at https://github.com/zxcnmmmmm/EMF-GAN-master.
期刊介绍:
Computers & Graphics is dedicated to disseminate information on research and applications of computer graphics (CG) techniques. The journal encourages articles on:
1. Research and applications of interactive computer graphics. We are particularly interested in novel interaction techniques and applications of CG to problem domains.
2. State-of-the-art papers on late-breaking, cutting-edge research on CG.
3. Information on innovative uses of graphics principles and technologies.
4. Tutorial papers on both teaching CG principles and innovative uses of CG in education.