Multi spectral visible-thermal IR image translation using improved u-net & conditional diffusion

IF 6.5 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neurocomputing Pub Date : 2025-07-14 DOI:10.1016/j.neucom.2025.131006

Mahroosh Banday, Brejesh Lall

{"title":"Multi spectral visible-thermal IR image translation using improved u-net & conditional diffusion","authors":"Mahroosh Banday, Brejesh Lall","doi":"10.1016/j.neucom.2025.131006","DOIUrl":null,"url":null,"abstract":"<div><div>Translating images from visible spectrum to thermal IR (TIR) domain to achieve precise and realistic representations of TIR images is a challenging task. Thermal infrared imaging is of great significance in scenarios where vision is severely impaired especially in difficult lighting conditions such as night, haze, fog or cloudy weather. With these advantages, infrared imaging finds extensive applicability in navigation, surveillance, object detection, product inspection, agriculture as well as remote sensing. In order to build high performance deep models for such wide range of applications, it is necessary to have large amount of TIR data for training. However, there is unavailability of sufficient IR based datasets due to high cost of thermal infrared camera setups. While large number of visible image datasets are available, this scarcity of TIR datasets can be addressed by translating visible images to their TIR counterparts. In this paper, we leverage the widely available visible range data to propose two visible to TIR domain translation approaches, one is modified U-Net based non-generative approach called TIR-UNet and the other is conditional diffusion based generative approach that also uses U-Net as neural backbone for synthesizing TIR images. Both the proposed methods have been evaluated on four benchmark datasets and demonstrate high qualitative as well as quantitative performance in generating perceptually realistic, visually plausible and high quality TIR equivalents of given visible images. Compared to state-of-the-art methods which include U-Net and powerful GAN variants, our methods achieve remarkable performance increase on the metrics of MSE, PSNR and SSIM for both day and night images.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"651 ","pages":"Article 131006"},"PeriodicalIF":6.5000,"publicationDate":"2025-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neurocomputing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0925231225016789","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Translating images from visible spectrum to thermal IR (TIR) domain to achieve precise and realistic representations of TIR images is a challenging task. Thermal infrared imaging is of great significance in scenarios where vision is severely impaired especially in difficult lighting conditions such as night, haze, fog or cloudy weather. With these advantages, infrared imaging finds extensive applicability in navigation, surveillance, object detection, product inspection, agriculture as well as remote sensing. In order to build high performance deep models for such wide range of applications, it is necessary to have large amount of TIR data for training. However, there is unavailability of sufficient IR based datasets due to high cost of thermal infrared camera setups. While large number of visible image datasets are available, this scarcity of TIR datasets can be addressed by translating visible images to their TIR counterparts. In this paper, we leverage the widely available visible range data to propose two visible to TIR domain translation approaches, one is modified U-Net based non-generative approach called TIR-UNet and the other is conditional diffusion based generative approach that also uses U-Net as neural backbone for synthesizing TIR images. Both the proposed methods have been evaluated on four benchmark datasets and demonstrate high qualitative as well as quantitative performance in generating perceptually realistic, visually plausible and high quality TIR equivalents of given visible images. Compared to state-of-the-art methods which include U-Net and powerful GAN variants, our methods achieve remarkable performance increase on the metrics of MSE, PSNR and SSIM for both day and night images.

查看原文本刊更多论文

基于改进u-net和条件扩散的多光谱可见-热红外图像转换

将可见光谱图像转换到热红外（TIR）域，以实现精确和真实的TIR图像表示是一项具有挑战性的任务。热红外成像在视力严重受损的情况下，特别是在夜间、雾霾、大雾或多云天气等困难的照明条件下，具有重要意义。由于这些优点，红外成像在导航、监视、目标检测、产品检验、农业以及遥感等方面有着广泛的应用。为了为如此广泛的应用构建高性能的深度模型，需要有大量的TIR数据进行训练。然而，由于热红外相机设置的高成本，无法获得足够的基于红外的数据集。虽然有大量的可见图像数据集可用，但可以通过将可见图像转换为对应的TIR图像来解决TIR数据集的稀缺性问题。在本文中，我们利用广泛可用的可见距离数据提出了两种可见到TIR域的翻译方法，一种是改进的基于U-Net的非生成方法，称为TIR- unet，另一种是基于条件扩散的生成方法，也使用U-Net作为神经主干来合成TIR图像。这两种方法都在四个基准数据集上进行了评估，并在生成给定可见图像的感知逼真、视觉可信和高质量TIR等效物方面展示了高定性和定量的性能。与最先进的方法（包括U-Net和强大的GAN变体）相比，我们的方法在白天和夜间图像的MSE， PSNR和SSIM指标上都实现了显着的性能提升。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Neurocomputing 工程技术-计算机：人工智能

CiteScore

13.10

自引率

10.00%

发文量

1382

审稿时长

70 days

期刊介绍： Neurocomputing publishes articles describing recent fundamental contributions in the field of neurocomputing. Neurocomputing theory, practice and applications are the essential topics being covered.