Multi-scale transformer network for super-resolution of visible and thermal air images

Intelligent Systems with Applications Pub Date : 2024-09-01 DOI:10.1016/j.iswa.2024.200429

Hèdi Fkih , Abdelaziz Kallel , Zied Chtourou

{"title":"Multi-scale transformer network for super-resolution of visible and thermal air images","authors":"Hèdi Fkih , Abdelaziz Kallel , Zied Chtourou","doi":"10.1016/j.iswa.2024.200429","DOIUrl":null,"url":null,"abstract":"<div><p>Reference image-based Super-Resolution (RefSR) is introduced to improve the quality of a Low-resolution (LR) input image by leveraging the additional information provided by a High-Resolution (HR) reference image (Ref). While existing RefSR methods focus on thermal or visible flows separately, they often struggle to enhance the resolution of small objects such as Mini/Micro UAVs (Unmanned Aerial Vehicle) due to the resolution disparities between the input and reference images. To cope with these challenges when dealing with UAV early detection in context of video surveillance, we propose ThermoVisSR, a multiscale texture transformer for enhancing the Super-Resolution (SR) of visible and thermal images of Mini/Micro UAVs. Our approach tries to reconstruct the fine details of these objects while preserving their approximation (the body form and color of the different scene objects) already contained in the LR image. Hence, our model is divided up into two streams dealing separately with approximation and detail reconstruction. In the first one, we introduce a Convolution Neural Network (CNN) fusion backbone to extract the Low-Frequency (LF) approximation from the original LR image pairs. In the second one and to extract the details from the Ref image, our approach involves blending features from both visible and thermal sources to make the most of what each offer. Subsequently, we introduce the High-Frequency Texture Transformer (HFTT) across various resolutions of the merged features to ensure an accurate correspondence matching and significant transfer of High-Frequency (HF) patches from Ref to LR images. Moreover, to adapt the injection to the different bands well, we incorporate the separable software decoder (SSD) into the HFTT allowing to capture channel-specific details during the reconstruction phase. We validated our approach using a newly created dataset of Air images of Mini/Micro UAVs. Experimental results demonstrate that the proposed model consistently outperforms the state-of-the-art approaches on both qualitative and quantitative assessments.</p></div>","PeriodicalId":100684,"journal":{"name":"Intelligent Systems with Applications","volume":"23 ","pages":"Article 200429"},"PeriodicalIF":0.0000,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2667305324001030/pdfft?md5=708b3e9003aec9aee059364d6ad6c586&pid=1-s2.0-S2667305324001030-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Intelligent Systems with Applications","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2667305324001030","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Reference image-based Super-Resolution (RefSR) is introduced to improve the quality of a Low-resolution (LR) input image by leveraging the additional information provided by a High-Resolution (HR) reference image (Ref). While existing RefSR methods focus on thermal or visible flows separately, they often struggle to enhance the resolution of small objects such as Mini/Micro UAVs (Unmanned Aerial Vehicle) due to the resolution disparities between the input and reference images. To cope with these challenges when dealing with UAV early detection in context of video surveillance, we propose ThermoVisSR, a multiscale texture transformer for enhancing the Super-Resolution (SR) of visible and thermal images of Mini/Micro UAVs. Our approach tries to reconstruct the fine details of these objects while preserving their approximation (the body form and color of the different scene objects) already contained in the LR image. Hence, our model is divided up into two streams dealing separately with approximation and detail reconstruction. In the first one, we introduce a Convolution Neural Network (CNN) fusion backbone to extract the Low-Frequency (LF) approximation from the original LR image pairs. In the second one and to extract the details from the Ref image, our approach involves blending features from both visible and thermal sources to make the most of what each offer. Subsequently, we introduce the High-Frequency Texture Transformer (HFTT) across various resolutions of the merged features to ensure an accurate correspondence matching and significant transfer of High-Frequency (HF) patches from Ref to LR images. Moreover, to adapt the injection to the different bands well, we incorporate the separable software decoder (SSD) into the HFTT allowing to capture channel-specific details during the reconstruction phase. We validated our approach using a newly created dataset of Air images of Mini/Micro UAVs. Experimental results demonstrate that the proposed model consistently outperforms the state-of-the-art approaches on both qualitative and quantitative assessments.

查看原文本刊更多论文

用于超分辨率可见光和热空气图像的多尺度变压器网络

基于参考图像的超分辨率（RefSR）是通过利用高分辨率（HR）参考图像（Ref）提供的附加信息来提高低分辨率（LR）输入图像的质量。虽然现有的 RefSR 方法分别侧重于热流或可见光流，但由于输入图像和参考图像之间的分辨率差异，这些方法往往难以提高小型物体（如微型/微型无人机）的分辨率。为了应对视频监控中无人机早期检测所面临的这些挑战，我们提出了 ThermoVisSR，这是一种多尺度纹理变换器，用于增强迷你/微型无人机可见光和热图像的超分辨率（SR）。我们的方法试图在保留 LR 图像中已包含的近似值（不同场景物体的体形和颜色）的同时，重建这些物体的精细细节。因此，我们的模型分为两个流程，分别处理近似和细节重建。在第一个流程中，我们引入了一个卷积神经网络（CNN）融合骨干，从原始 LR 图像对中提取低频（LF）近似值。其次，为了从反射图像中提取细节，我们的方法涉及融合可见光和热源的特征，以充分利用各自的优势。随后，我们在合并特征的不同分辨率中引入了高频纹理变换器（HFTT），以确保精确的对应匹配和高频（HF）斑块从参考图像到 LR 图像的显著转移。此外，为了使注入能够很好地适应不同的波段，我们将可分离软件解码器（SSD）纳入了高频纹理器，从而在重建阶段捕捉特定信道的细节。我们使用新创建的迷你/微型无人机空气图像数据集验证了我们的方法。实验结果表明，在定性和定量评估方面，所提出的模型始终优于最先进的方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Intelligent Systems with Applications

CiteScore

5.60

自引率

0.00%

发文量