Xiaopeng Zhang , Shuping Tao , Qinping Feng , Wei Dou , Haocheng Du , Miao Yu , Xiaojuan Tai , Mingyang Gao , Han Liu
{"title":"SV-CGAN: Infrared image generation based on CycleGAN","authors":"Xiaopeng Zhang , Shuping Tao , Qinping Feng , Wei Dou , Haocheng Du , Miao Yu , Xiaojuan Tai , Mingyang Gao , Han Liu","doi":"10.1016/j.infrared.2025.106051","DOIUrl":null,"url":null,"abstract":"<div><div>Infrared image generation holds significant application value in fields such as night vision surveillance, military reconnaissance, autonomous driving, and disaster rescue. It overcomes the imaging limitations of visible light sensors under low-light conditions, harsh weather, or complex environments, providing critical data support for all-weather perception and decision-making. However, existing deep learning-based methods for generating infrared images still face challenges, including imprecise cross-modal feature alignment and modeling bias in thermal radiation distribution, which result in generated images with blurred details, artifact noise, and weak generalization across different spectral bands, severely limiting their practical applicability. This paper proposes the SV-CGAN model, which integrates semantic segmentation with CycleGAN and employs a generator combining U-Net and an improved Vision Transformer (ViT) at the bottleneck, along with an optimized multi-task loss function. This approach enables the generation of high-quality infrared images under conditions of unpaired data. Experimental results demonstrate that SV-CGAN achieves a peak signal-to-noise ratio (PSNR) of 30.8315 dB and a structural similarity index (SSIM) of 0.8934 on the Multispectral Pedestrian Dataset (MPD), outperforming CycleGAN (PSNR 26.5260 dB, SSIM 0.8257) by 16.2 % and 8.2 %, outperforming U-GAT-IT by 4.0 % and 2.7 %, respectively. Additionally, the Fréchet Inception Distance (FID) metric decreased by 33.1 % from 114.9876 to 76.9776. The model effectively achieves visible-to-infrared image translation under unpaired training conditions, producing images with higher realism and detail retention.</div></div>","PeriodicalId":13549,"journal":{"name":"Infrared Physics & Technology","volume":"151 ","pages":"Article 106051"},"PeriodicalIF":3.4000,"publicationDate":"2025-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Infrared Physics & Technology","FirstCategoryId":"101","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1350449525003445","RegionNum":3,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"INSTRUMENTS & INSTRUMENTATION","Score":null,"Total":0}
引用次数: 0
Abstract
Infrared image generation holds significant application value in fields such as night vision surveillance, military reconnaissance, autonomous driving, and disaster rescue. It overcomes the imaging limitations of visible light sensors under low-light conditions, harsh weather, or complex environments, providing critical data support for all-weather perception and decision-making. However, existing deep learning-based methods for generating infrared images still face challenges, including imprecise cross-modal feature alignment and modeling bias in thermal radiation distribution, which result in generated images with blurred details, artifact noise, and weak generalization across different spectral bands, severely limiting their practical applicability. This paper proposes the SV-CGAN model, which integrates semantic segmentation with CycleGAN and employs a generator combining U-Net and an improved Vision Transformer (ViT) at the bottleneck, along with an optimized multi-task loss function. This approach enables the generation of high-quality infrared images under conditions of unpaired data. Experimental results demonstrate that SV-CGAN achieves a peak signal-to-noise ratio (PSNR) of 30.8315 dB and a structural similarity index (SSIM) of 0.8934 on the Multispectral Pedestrian Dataset (MPD), outperforming CycleGAN (PSNR 26.5260 dB, SSIM 0.8257) by 16.2 % and 8.2 %, outperforming U-GAT-IT by 4.0 % and 2.7 %, respectively. Additionally, the Fréchet Inception Distance (FID) metric decreased by 33.1 % from 114.9876 to 76.9776. The model effectively achieves visible-to-infrared image translation under unpaired training conditions, producing images with higher realism and detail retention.
期刊介绍:
The Journal covers the entire field of infrared physics and technology: theory, experiment, application, devices and instrumentation. Infrared'' is defined as covering the near, mid and far infrared (terahertz) regions from 0.75um (750nm) to 1mm (300GHz.) Submissions in the 300GHz to 100GHz region may be accepted at the editors discretion if their content is relevant to shorter wavelengths. Submissions must be primarily concerned with and directly relevant to this spectral region.
Its core topics can be summarized as the generation, propagation and detection, of infrared radiation; the associated optics, materials and devices; and its use in all fields of science, industry, engineering and medicine.
Infrared techniques occur in many different fields, notably spectroscopy and interferometry; material characterization and processing; atmospheric physics, astronomy and space research. Scientific aspects include lasers, quantum optics, quantum electronics, image processing and semiconductor physics. Some important applications are medical diagnostics and treatment, industrial inspection and environmental monitoring.