SV-CGAN:基于CycleGAN的红外图像生成

IF 3.4 3区 物理与天体物理 Q2 INSTRUMENTS & INSTRUMENTATION
Xiaopeng Zhang , Shuping Tao , Qinping Feng , Wei Dou , Haocheng Du , Miao Yu , Xiaojuan Tai , Mingyang Gao , Han Liu
{"title":"SV-CGAN:基于CycleGAN的红外图像生成","authors":"Xiaopeng Zhang ,&nbsp;Shuping Tao ,&nbsp;Qinping Feng ,&nbsp;Wei Dou ,&nbsp;Haocheng Du ,&nbsp;Miao Yu ,&nbsp;Xiaojuan Tai ,&nbsp;Mingyang Gao ,&nbsp;Han Liu","doi":"10.1016/j.infrared.2025.106051","DOIUrl":null,"url":null,"abstract":"<div><div>Infrared image generation holds significant application value in fields such as night vision surveillance, military reconnaissance, autonomous driving, and disaster rescue. It overcomes the imaging limitations of visible light sensors under low-light conditions, harsh weather, or complex environments, providing critical data support for all-weather perception and decision-making. However, existing deep learning-based methods for generating infrared images still face challenges, including imprecise cross-modal feature alignment and modeling bias in thermal radiation distribution, which result in generated images with blurred details, artifact noise, and weak generalization across different spectral bands, severely limiting their practical applicability. This paper proposes the SV-CGAN model, which integrates semantic segmentation with CycleGAN and employs a generator combining U-Net and an improved Vision Transformer (ViT) at the bottleneck, along with an optimized multi-task loss function. This approach enables the generation of high-quality infrared images under conditions of unpaired data. Experimental results demonstrate that SV-CGAN achieves a peak signal-to-noise ratio (PSNR) of 30.8315 dB and a structural similarity index (SSIM) of 0.8934 on the Multispectral Pedestrian Dataset (MPD), outperforming CycleGAN (PSNR 26.5260 dB, SSIM 0.8257) by 16.2 % and 8.2 %, outperforming U-GAT-IT by 4.0 % and 2.7 %, respectively. Additionally, the Fréchet Inception Distance (FID) metric decreased by 33.1 % from 114.9876 to 76.9776. The model effectively achieves visible-to-infrared image translation under unpaired training conditions, producing images with higher realism and detail retention.</div></div>","PeriodicalId":13549,"journal":{"name":"Infrared Physics & Technology","volume":"151 ","pages":"Article 106051"},"PeriodicalIF":3.4000,"publicationDate":"2025-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"SV-CGAN: Infrared image generation based on CycleGAN\",\"authors\":\"Xiaopeng Zhang ,&nbsp;Shuping Tao ,&nbsp;Qinping Feng ,&nbsp;Wei Dou ,&nbsp;Haocheng Du ,&nbsp;Miao Yu ,&nbsp;Xiaojuan Tai ,&nbsp;Mingyang Gao ,&nbsp;Han Liu\",\"doi\":\"10.1016/j.infrared.2025.106051\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Infrared image generation holds significant application value in fields such as night vision surveillance, military reconnaissance, autonomous driving, and disaster rescue. It overcomes the imaging limitations of visible light sensors under low-light conditions, harsh weather, or complex environments, providing critical data support for all-weather perception and decision-making. However, existing deep learning-based methods for generating infrared images still face challenges, including imprecise cross-modal feature alignment and modeling bias in thermal radiation distribution, which result in generated images with blurred details, artifact noise, and weak generalization across different spectral bands, severely limiting their practical applicability. This paper proposes the SV-CGAN model, which integrates semantic segmentation with CycleGAN and employs a generator combining U-Net and an improved Vision Transformer (ViT) at the bottleneck, along with an optimized multi-task loss function. This approach enables the generation of high-quality infrared images under conditions of unpaired data. Experimental results demonstrate that SV-CGAN achieves a peak signal-to-noise ratio (PSNR) of 30.8315 dB and a structural similarity index (SSIM) of 0.8934 on the Multispectral Pedestrian Dataset (MPD), outperforming CycleGAN (PSNR 26.5260 dB, SSIM 0.8257) by 16.2 % and 8.2 %, outperforming U-GAT-IT by 4.0 % and 2.7 %, respectively. Additionally, the Fréchet Inception Distance (FID) metric decreased by 33.1 % from 114.9876 to 76.9776. The model effectively achieves visible-to-infrared image translation under unpaired training conditions, producing images with higher realism and detail retention.</div></div>\",\"PeriodicalId\":13549,\"journal\":{\"name\":\"Infrared Physics & Technology\",\"volume\":\"151 \",\"pages\":\"Article 106051\"},\"PeriodicalIF\":3.4000,\"publicationDate\":\"2025-07-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Infrared Physics & Technology\",\"FirstCategoryId\":\"101\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1350449525003445\",\"RegionNum\":3,\"RegionCategory\":\"物理与天体物理\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"INSTRUMENTS & INSTRUMENTATION\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Infrared Physics & Technology","FirstCategoryId":"101","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1350449525003445","RegionNum":3,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"INSTRUMENTS & INSTRUMENTATION","Score":null,"Total":0}
引用次数: 0

摘要

红外图像生成在夜视监视、军事侦察、自动驾驶、灾害救援等领域具有重要的应用价值。它克服了可见光传感器在弱光条件、恶劣天气或复杂环境下的成像限制,为全天候感知和决策提供关键数据支持。然而,现有的基于深度学习的红外图像生成方法仍然面临挑战,包括不精确的跨模态特征对齐和热辐射分布的建模偏差,导致生成的图像细节模糊、伪影噪声和不同光谱波段的弱泛化,严重限制了它们的实际适用性。本文提出了SV-CGAN模型,该模型将语义分割与CycleGAN相结合,在瓶颈处采用U-Net生成器和改进的视觉变压器(Vision Transformer, ViT),并优化了多任务损失函数。这种方法可以在数据不配对的情况下生成高质量的红外图像。实验结果表明,SV-CGAN在多光谱行人数据集(MPD)上的峰值信噪比(PSNR)为30.8315 dB,结构相似度指数(SSIM)为0.8934,分别优于CycleGAN (PSNR 26.5260 dB, SSIM 0.8257) 16.2%和8.2%,优于U-GAT-IT算法4.0%和2.7%。此外,fr起始距离(FID)指标减少了33.1%,从114.9876降至76.9776。该模型在非配对训练条件下有效实现了可见光到红外图像的转换,生成的图像具有更高的真实感和细节保留度。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
SV-CGAN: Infrared image generation based on CycleGAN
Infrared image generation holds significant application value in fields such as night vision surveillance, military reconnaissance, autonomous driving, and disaster rescue. It overcomes the imaging limitations of visible light sensors under low-light conditions, harsh weather, or complex environments, providing critical data support for all-weather perception and decision-making. However, existing deep learning-based methods for generating infrared images still face challenges, including imprecise cross-modal feature alignment and modeling bias in thermal radiation distribution, which result in generated images with blurred details, artifact noise, and weak generalization across different spectral bands, severely limiting their practical applicability. This paper proposes the SV-CGAN model, which integrates semantic segmentation with CycleGAN and employs a generator combining U-Net and an improved Vision Transformer (ViT) at the bottleneck, along with an optimized multi-task loss function. This approach enables the generation of high-quality infrared images under conditions of unpaired data. Experimental results demonstrate that SV-CGAN achieves a peak signal-to-noise ratio (PSNR) of 30.8315 dB and a structural similarity index (SSIM) of 0.8934 on the Multispectral Pedestrian Dataset (MPD), outperforming CycleGAN (PSNR 26.5260 dB, SSIM 0.8257) by 16.2 % and 8.2 %, outperforming U-GAT-IT by 4.0 % and 2.7 %, respectively. Additionally, the Fréchet Inception Distance (FID) metric decreased by 33.1 % from 114.9876 to 76.9776. The model effectively achieves visible-to-infrared image translation under unpaired training conditions, producing images with higher realism and detail retention.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
5.70
自引率
12.10%
发文量
400
审稿时长
67 days
期刊介绍: The Journal covers the entire field of infrared physics and technology: theory, experiment, application, devices and instrumentation. Infrared'' is defined as covering the near, mid and far infrared (terahertz) regions from 0.75um (750nm) to 1mm (300GHz.) Submissions in the 300GHz to 100GHz region may be accepted at the editors discretion if their content is relevant to shorter wavelengths. Submissions must be primarily concerned with and directly relevant to this spectral region. Its core topics can be summarized as the generation, propagation and detection, of infrared radiation; the associated optics, materials and devices; and its use in all fields of science, industry, engineering and medicine. Infrared techniques occur in many different fields, notably spectroscopy and interferometry; material characterization and processing; atmospheric physics, astronomy and space research. Scientific aspects include lasers, quantum optics, quantum electronics, image processing and semiconductor physics. Some important applications are medical diagnostics and treatment, industrial inspection and environmental monitoring.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信