Teng Ma , Kuanhong Cheng , Tingting Chai , Yubo Wu , Huixin Zhou
{"title":"An Wavelet Steered network for efficient infrared small target detection","authors":"Teng Ma , Kuanhong Cheng , Tingting Chai , Yubo Wu , Huixin Zhou","doi":"10.1016/j.infrared.2025.105850","DOIUrl":null,"url":null,"abstract":"<div><div>This paper introduces an innovative approach for infrared small target detection (IRSTD). Our model is primarily motivated by the integration of Discrete Wavelet Transform (DWT) into a convolutional neural network (CNN), combining the classical multi-scale analysis method with the deep learning (DL) framework. Firstly, it is argued that detail loss may be induced by downsampling operations such as average or maximize pooling, and the interpolation of upsampling is inefficient in increasing useful information. Therefore, Haar DWT and inverse DWT (IDWT) are embedded for lossless downsampling and upsampling, facilitating more effective feature extraction within the CNN. Secondly, a hybrid attention mechanism, referred to as Wavelet Steered Transformer (WST), is designed to fully enhance the DWT features both spatially and across channels. This mechanism consists of two key improvements: (1) Channel-wise Transformer: We propose adapting a channel-wise Transformer to enhance semantic information and suppress background clutter and noise. This enhancement ensures that the targets and background features are distributed across different channels, thereby boosting detection performance. (2) Dilated-Gate Convolutional Module: A dilated-gate convolutional module is employed to enhance spatial location accuracy. Unlike previous methods for location extraction, this module uses a combination of different kernel sizes and dilation rates to improve spatial accuracy. Experimental results on benchmark datasets showcase the supervisor performance of the proposed method. The code and data for this paper will be released at <span><span>https://github.com/Fortuneteller6/WaveTD</span><svg><path></path></svg></span> once the paper is accepted.</div></div>","PeriodicalId":13549,"journal":{"name":"Infrared Physics & Technology","volume":"148 ","pages":"Article 105850"},"PeriodicalIF":3.1000,"publicationDate":"2025-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Infrared Physics & Technology","FirstCategoryId":"101","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1350449525001434","RegionNum":3,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"INSTRUMENTS & INSTRUMENTATION","Score":null,"Total":0}
引用次数: 0
Abstract
This paper introduces an innovative approach for infrared small target detection (IRSTD). Our model is primarily motivated by the integration of Discrete Wavelet Transform (DWT) into a convolutional neural network (CNN), combining the classical multi-scale analysis method with the deep learning (DL) framework. Firstly, it is argued that detail loss may be induced by downsampling operations such as average or maximize pooling, and the interpolation of upsampling is inefficient in increasing useful information. Therefore, Haar DWT and inverse DWT (IDWT) are embedded for lossless downsampling and upsampling, facilitating more effective feature extraction within the CNN. Secondly, a hybrid attention mechanism, referred to as Wavelet Steered Transformer (WST), is designed to fully enhance the DWT features both spatially and across channels. This mechanism consists of two key improvements: (1) Channel-wise Transformer: We propose adapting a channel-wise Transformer to enhance semantic information and suppress background clutter and noise. This enhancement ensures that the targets and background features are distributed across different channels, thereby boosting detection performance. (2) Dilated-Gate Convolutional Module: A dilated-gate convolutional module is employed to enhance spatial location accuracy. Unlike previous methods for location extraction, this module uses a combination of different kernel sizes and dilation rates to improve spatial accuracy. Experimental results on benchmark datasets showcase the supervisor performance of the proposed method. The code and data for this paper will be released at https://github.com/Fortuneteller6/WaveTD once the paper is accepted.
期刊介绍:
The Journal covers the entire field of infrared physics and technology: theory, experiment, application, devices and instrumentation. Infrared'' is defined as covering the near, mid and far infrared (terahertz) regions from 0.75um (750nm) to 1mm (300GHz.) Submissions in the 300GHz to 100GHz region may be accepted at the editors discretion if their content is relevant to shorter wavelengths. Submissions must be primarily concerned with and directly relevant to this spectral region.
Its core topics can be summarized as the generation, propagation and detection, of infrared radiation; the associated optics, materials and devices; and its use in all fields of science, industry, engineering and medicine.
Infrared techniques occur in many different fields, notably spectroscopy and interferometry; material characterization and processing; atmospheric physics, astronomy and space research. Scientific aspects include lasers, quantum optics, quantum electronics, image processing and semiconductor physics. Some important applications are medical diagnostics and treatment, industrial inspection and environmental monitoring.