{"title":"Multiscale contextual joint feature enhancement GAN for semantic image synthesis","authors":"Hengyou Wang , Rongxin Ma , Xiang Jiang","doi":"10.1016/j.imavis.2025.105637","DOIUrl":null,"url":null,"abstract":"<div><div>Semantic image synthesis aims to generate images conditioned on semantic segmentation maps. Existing methods typically employ a generative adversarial framework to combine latent variables with semantic segmentation maps. However, traditional convolutions and feature map complexity often lead to issues such as uneven color, unrealistic textures, and blurred edges in generated images. To address these issues, we propose the Multiscale Contextual Joint Feature Enhancement Generative Adversarial Network, called MSCJ-GAN. Specifically, to capture local details and enhance the global consistency of large-scale objects, a large receptive field feature enhancement module based on Fast Fourier Convolution (FFC) and Transformer is introduced. This module employs an attention mechanism in the frequency domain, enabling neurons in the early layers of the network to access contextual information from the entire image. Furthermore, to ensure clear and realistic textures for objects and their boundaries, a dual-dimensional feature enhancement module based on bias is proposed. This module fully utilizes the statistical features in the feature maps, channel differences, and the detailed expression of the bias matrix to improve the realism of the generated images. Finally, experimental results on three challenging datasets demonstrate that the proposed MSCJ-GAN outperforms state-of-the-art methods, achieving superior performance in generating large-scale objects (e.g., sky and grass) and intricate texture details (e.g., wrinkles and micro-expressions). The code will be released after this work is published: <span><span>https://github.com/xinxin0312/MSCJ-GAN</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"161 ","pages":"Article 105637"},"PeriodicalIF":4.2000,"publicationDate":"2025-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Image and Vision Computing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0262885625002252","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Semantic image synthesis aims to generate images conditioned on semantic segmentation maps. Existing methods typically employ a generative adversarial framework to combine latent variables with semantic segmentation maps. However, traditional convolutions and feature map complexity often lead to issues such as uneven color, unrealistic textures, and blurred edges in generated images. To address these issues, we propose the Multiscale Contextual Joint Feature Enhancement Generative Adversarial Network, called MSCJ-GAN. Specifically, to capture local details and enhance the global consistency of large-scale objects, a large receptive field feature enhancement module based on Fast Fourier Convolution (FFC) and Transformer is introduced. This module employs an attention mechanism in the frequency domain, enabling neurons in the early layers of the network to access contextual information from the entire image. Furthermore, to ensure clear and realistic textures for objects and their boundaries, a dual-dimensional feature enhancement module based on bias is proposed. This module fully utilizes the statistical features in the feature maps, channel differences, and the detailed expression of the bias matrix to improve the realism of the generated images. Finally, experimental results on three challenging datasets demonstrate that the proposed MSCJ-GAN outperforms state-of-the-art methods, achieving superior performance in generating large-scale objects (e.g., sky and grass) and intricate texture details (e.g., wrinkles and micro-expressions). The code will be released after this work is published: https://github.com/xinxin0312/MSCJ-GAN.
期刊介绍:
Image and Vision Computing has as a primary aim the provision of an effective medium of interchange for the results of high quality theoretical and applied research fundamental to all aspects of image interpretation and computer vision. The journal publishes work that proposes new image interpretation and computer vision methodology or addresses the application of such methods to real world scenes. It seeks to strengthen a deeper understanding in the discipline by encouraging the quantitative comparison and performance evaluation of the proposed methodology. The coverage includes: image interpretation, scene modelling, object recognition and tracking, shape analysis, monitoring and surveillance, active vision and robotic systems, SLAM, biologically-inspired computer vision, motion analysis, stereo vision, document image understanding, character and handwritten text recognition, face and gesture recognition, biometrics, vision-based human-computer interaction, human activity and behavior understanding, data fusion from multiple sensor inputs, image databases.