SV-CGAN：基于CycleGAN的红外图像生成

IF 3.4 3区物理与天体物理 Q2 INSTRUMENTS & INSTRUMENTATION

Infrared Physics & Technology Pub Date : 2025-07-30 DOI:10.1016/j.infrared.2025.106051

Xiaopeng Zhang , Shuping Tao , Qinping Feng , Wei Dou , Haocheng Du , Miao Yu , Xiaojuan Tai , Mingyang Gao , Han Liu

{"title":"SV-CGAN：基于CycleGAN的红外图像生成","authors":"Xiaopeng Zhang , Shuping Tao , Qinping Feng , Wei Dou , Haocheng Du , Miao Yu , Xiaojuan Tai , Mingyang Gao , Han Liu","doi":"10.1016/j.infrared.2025.106051","DOIUrl":null,"url":null,"abstract":"<div><div>Infrared image generation holds significant application value in fields such as night vision surveillance, military reconnaissance, autonomous driving, and disaster rescue. It overcomes the imaging limitations of visible light sensors under low-light conditions, harsh weather, or complex environments, providing critical data support for all-weather perception and decision-making. However, existing deep learning-based methods for generating infrared images still face challenges, including imprecise cross-modal feature alignment and modeling bias in thermal radiation distribution, which result in generated images with blurred details, artifact noise, and weak generalization across different spectral bands, severely limiting their practical applicability. This paper proposes the SV-CGAN model, which integrates semantic segmentation with CycleGAN and employs a generator combining U-Net and an improved Vision Transformer (ViT) at the bottleneck, along with an optimized multi-task loss function. This approach enables the generation of high-quality infrared images under conditions of unpaired data. Experimental results demonstrate that SV-CGAN achieves a peak signal-to-noise ratio (PSNR) of 30.8315 dB and a structural similarity index (SSIM) of 0.8934 on the Multispectral Pedestrian Dataset (MPD), outperforming CycleGAN (PSNR 26.5260 dB, SSIM 0.8257) by 16.2 % and 8.2 %, outperforming U-GAT-IT by 4.0 % and 2.7 %, respectively. Additionally, the Fréchet Inception Distance (FID) metric decreased by 33.1 % from 114.9876 to 76.9776. The model effectively achieves visible-to-infrared image translation under unpaired training conditions, producing images with higher realism and detail retention.</div></div>","PeriodicalId":13549,"journal":{"name":"Infrared Physics & Technology","volume":"151 ","pages":"Article 106051"},"PeriodicalIF":3.4000,"publicationDate":"2025-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"SV-CGAN: Infrared image generation based on CycleGAN\",\"authors\":\"Xiaopeng Zhang , Shuping Tao , Qinping Feng , Wei Dou , Haocheng Du , Miao Yu , Xiaojuan Tai , Mingyang Gao , Han Liu\",\"doi\":\"10.1016/j.infrared.2025.106051\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Infrared image generation holds significant application value in fields such as night vision surveillance, military reconnaissance, autonomous driving, and disaster rescue. It overcomes the imaging limitations of visible light sensors under low-light conditions, harsh weather, or complex environments, providing critical data support for all-weather perception and decision-making. However, existing deep learning-based methods for generating infrared images still face challenges, including imprecise cross-modal feature alignment and modeling bias in thermal radiation distribution, which result in generated images with blurred details, artifact noise, and weak generalization across different spectral bands, severely limiting their practical applicability. This paper proposes the SV-CGAN model, which integrates semantic segmentation with CycleGAN and employs a generator combining U-Net and an improved Vision Transformer (ViT) at the bottleneck, along with an optimized multi-task loss function. This approach enables the generation of high-quality infrared images under conditions of unpaired data. Experimental results demonstrate that SV-CGAN achieves a peak signal-to-noise ratio (PSNR) of 30.8315 dB and a structural similarity index (SSIM) of 0.8934 on the Multispectral Pedestrian Dataset (MPD), outperforming CycleGAN (PSNR 26.5260 dB, SSIM 0.8257) by 16.2 % and 8.2 %, outperforming U-GAT-IT by 4.0 % and 2.7 %, respectively. Additionally, the Fréchet Inception Distance (FID) metric decreased by 33.1 % from 114.9876 to 76.9776. The model effectively achieves visible-to-infrared image translation under unpaired training conditions, producing images with higher realism and detail retention.</div></div>\",\"PeriodicalId\":13549,\"journal\":{\"name\":\"Infrared Physics & Technology\",\"volume\":\"151 \",\"pages\":\"Article 106051\"},\"PeriodicalIF\":3.4000,\"publicationDate\":\"2025-07-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Infrared Physics & Technology\",\"FirstCategoryId\":\"101\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1350449525003445\",\"RegionNum\":3,\"RegionCategory\":\"物理与天体物理\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"INSTRUMENTS & INSTRUMENTATION\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Infrared Physics & Technology","FirstCategoryId":"101","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1350449525003445","RegionNum":3,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"INSTRUMENTS & INSTRUMENTATION","Score":null,"Total":0}

引用次数: 0

摘要

红外图像生成在夜视监视、军事侦察、自动驾驶、灾害救援等领域具有重要的应用价值。它克服了可见光传感器在弱光条件、恶劣天气或复杂环境下的成像限制，为全天候感知和决策提供关键数据支持。然而，现有的基于深度学习的红外图像生成方法仍然面临挑战，包括不精确的跨模态特征对齐和热辐射分布的建模偏差，导致生成的图像细节模糊、伪影噪声和不同光谱波段的弱泛化，严重限制了它们的实际适用性。本文提出了SV-CGAN模型，该模型将语义分割与CycleGAN相结合，在瓶颈处采用U-Net生成器和改进的视觉变压器（Vision Transformer, ViT），并优化了多任务损失函数。这种方法可以在数据不配对的情况下生成高质量的红外图像。实验结果表明，SV-CGAN在多光谱行人数据集（MPD）上的峰值信噪比（PSNR）为30.8315 dB，结构相似度指数（SSIM）为0.8934，分别优于CycleGAN (PSNR 26.5260 dB, SSIM 0.8257) 16.2%和8.2%，优于U-GAT-IT算法4.0%和2.7%。此外，fr起始距离（FID）指标减少了33.1%，从114.9876降至76.9776。该模型在非配对训练条件下有效实现了可见光到红外图像的转换，生成的图像具有更高的真实感和细节保留度。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

SV-CGAN: Infrared image generation based on CycleGAN

Infrared image generation holds significant application value in fields such as night vision surveillance, military reconnaissance, autonomous driving, and disaster rescue. It overcomes the imaging limitations of visible light sensors under low-light conditions, harsh weather, or complex environments, providing critical data support for all-weather perception and decision-making. However, existing deep learning-based methods for generating infrared images still face challenges, including imprecise cross-modal feature alignment and modeling bias in thermal radiation distribution, which result in generated images with blurred details, artifact noise, and weak generalization across different spectral bands, severely limiting their practical applicability. This paper proposes the SV-CGAN model, which integrates semantic segmentation with CycleGAN and employs a generator combining U-Net and an improved Vision Transformer (ViT) at the bottleneck, along with an optimized multi-task loss function. This approach enables the generation of high-quality infrared images under conditions of unpaired data. Experimental results demonstrate that SV-CGAN achieves a peak signal-to-noise ratio (PSNR) of 30.8315 dB and a structural similarity index (SSIM) of 0.8934 on the Multispectral Pedestrian Dataset (MPD), outperforming CycleGAN (PSNR 26.5260 dB, SSIM 0.8257) by 16.2 % and 8.2 %, outperforming U-GAT-IT by 4.0 % and 2.7 %, respectively. Additionally, the Fréchet Inception Distance (FID) metric decreased by 33.1 % from 114.9876 to 76.9776. The model effectively achieves visible-to-infrared image translation under unpaired training conditions, producing images with higher realism and detail retention.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Infrared Physics & Technology 物理-光学

CiteScore

5.70

自引率

12.10%

发文量

400

审稿时长

67 days

期刊介绍： The Journal covers the entire field of infrared physics and technology: theory, experiment, application, devices and instrumentation. Infrared'' is defined as covering the near, mid and far infrared (terahertz) regions from 0.75um (750nm) to 1mm (300GHz.) Submissions in the 300GHz to 100GHz region may be accepted at the editors discretion if their content is relevant to shorter wavelengths. Submissions must be primarily concerned with and directly relevant to this spectral region. Its core topics can be summarized as the generation, propagation and detection, of infrared radiation; the associated optics, materials and devices; and its use in all fields of science, industry, engineering and medicine. Infrared techniques occur in many different fields, notably spectroscopy and interferometry; material characterization and processing; atmospheric physics, astronomy and space research. Scientific aspects include lasers, quantum optics, quantum electronics, image processing and semiconductor physics. Some important applications are medical diagnostics and treatment, industrial inspection and environmental monitoring.