{"title":"DFS-GAN: A One-Stage Backbone Enhancement Model for Text-to-Image","authors":"Junkai Yi, Yiran Wei, Lingling Tan","doi":"10.1049/ell2.70399","DOIUrl":null,"url":null,"abstract":"<p>The text-to-image technology primarily relies on generative adversarial networks (GANs). However, traditional GANs encounter several challenges, for example, limited semantic correlation between generated images and textual information, fuzzy details and inadequate structural integrity, and the prevalent utilisation of redundant phased network architectures. In this paper, we propose a deep fusion generative adversarial network (DF-GAN) enhancement model (DFS-GAN) combined with a self-attention mechanism. The generator of the DF-GAN model is more streamlined compared to previous network models, enabling it to synthesise images with higher quality and text-image semantic consistency. We also made targeted improvements based on the “limitations” mentioned in the DF-GAN paper, specifically addressing the model's ability to synthesise fine-grained features and the use of existing pre-trained large models. bidirectional encoder representations from transformers (BERT) is used to mine the semantic features of text context, and the deep text-image fusion block (DFBlock) is added to realise the matching of deep text semantics and image regional features. Then, a self-attention mechanism module is introduced as a supplement to the convolution module at the model architecture level, aiming to better establish long-distance and multi-level dependencies. The experimental results show that the proposed DFS-GAN model not only strengthens the semantic relationship between the text and the image but also ensures the precise details and overall integrity of the generated image.</p>","PeriodicalId":11556,"journal":{"name":"Electronics Letters","volume":"61 1","pages":""},"PeriodicalIF":0.8000,"publicationDate":"2025-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/ell2.70399","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Electronics Letters","FirstCategoryId":"5","ListUrlMain":"https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/ell2.70399","RegionNum":4,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0
Abstract
The text-to-image technology primarily relies on generative adversarial networks (GANs). However, traditional GANs encounter several challenges, for example, limited semantic correlation between generated images and textual information, fuzzy details and inadequate structural integrity, and the prevalent utilisation of redundant phased network architectures. In this paper, we propose a deep fusion generative adversarial network (DF-GAN) enhancement model (DFS-GAN) combined with a self-attention mechanism. The generator of the DF-GAN model is more streamlined compared to previous network models, enabling it to synthesise images with higher quality and text-image semantic consistency. We also made targeted improvements based on the “limitations” mentioned in the DF-GAN paper, specifically addressing the model's ability to synthesise fine-grained features and the use of existing pre-trained large models. bidirectional encoder representations from transformers (BERT) is used to mine the semantic features of text context, and the deep text-image fusion block (DFBlock) is added to realise the matching of deep text semantics and image regional features. Then, a self-attention mechanism module is introduced as a supplement to the convolution module at the model architecture level, aiming to better establish long-distance and multi-level dependencies. The experimental results show that the proposed DFS-GAN model not only strengthens the semantic relationship between the text and the image but also ensures the precise details and overall integrity of the generated image.
期刊介绍:
Electronics Letters is an internationally renowned peer-reviewed rapid-communication journal that publishes short original research papers every two weeks. Its broad and interdisciplinary scope covers the latest developments in all electronic engineering related fields including communication, biomedical, optical and device technologies. Electronics Letters also provides further insight into some of the latest developments through special features and interviews.
Scope
As a journal at the forefront of its field, Electronics Letters publishes papers covering all themes of electronic and electrical engineering. The major themes of the journal are listed below.
Antennas and Propagation
Biomedical and Bioinspired Technologies, Signal Processing and Applications
Control Engineering
Electromagnetism: Theory, Materials and Devices
Electronic Circuits and Systems
Image, Video and Vision Processing and Applications
Information, Computing and Communications
Instrumentation and Measurement
Microwave Technology
Optical Communications
Photonics and Opto-Electronics
Power Electronics, Energy and Sustainability
Radar, Sonar and Navigation
Semiconductor Technology
Signal Processing
MIMO