{"title":"MDSF: A Plug-and-Play Block for Boosting Infrared Small Target Detection in YOLO-Based Networks","authors":"Yonghao Gu;Ying Guo;Wei Xie;Zhe Wu;Shibo Dong;Guokang Xie;Weifeng Xu","doi":"10.1109/TGRS.2025.3566889","DOIUrl":null,"url":null,"abstract":"This article tackles the challenges of infrared small target detection, aiming to improve detection accuracy and robustness in complex, low-contrast infrared environments. We propose several novel enhancements to YOLO-based models, commonly employed in real-time target detection tasks. First, we introduce a multiscale dilated separable fusion (MDSF) block, a flexible plug-in that can replace traditional convolution layers and be inserted at various stages of the network. This module enhances the network’s sensitivity to small targets by leveraging large convolution kernels in conjunction with multiscale decomposition. Next, we design a deep feature fusion (DFF) module and a MDSF-Head based on the MDSF block, and integrate them into YOLO models (v5-v11), resulting in significant performance gains, with mAP@50 values improving by 5.4%–9.6%. Furthermore, we propose the coarse-to-fine spatial and channel reconstruction convolution (C2f_SCConv) module, which effectively fuses shallow spatial features with deep semantic features, boosting detection performance, particularly for occluded and small targets. Additionally, we incorporate the spatial-to-depth (SPD) convolution module and replace the traditional complete intersection over union (CIoU) with efficient-intersection over union (EIoU) to further optimize the model. Experimental results on the forward-looking infrared (FLIR) ADAS dataset demonstrate that our approach outperforms the baseline YOLOv8n, with improvements of 10.9% in mAP@50% and 10.3% in mAP@50-95. On the high-altitude infrared thermal dataset for unmanned aerial vehicle (HIT-UAV)-based object detection dataset, we observe similar improvements, with mAP@50 increasing by 8.1% and mAP@50-95 by 9.7%. These results validate the effectiveness of our proposed method, substantially enhancing detection accuracy, robustness, and adaptability in challenging infrared environments.","PeriodicalId":13213,"journal":{"name":"IEEE Transactions on Geoscience and Remote Sensing","volume":"63 ","pages":"1-14"},"PeriodicalIF":8.6000,"publicationDate":"2025-03-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Geoscience and Remote Sensing","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10985819/","RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0
Abstract
This article tackles the challenges of infrared small target detection, aiming to improve detection accuracy and robustness in complex, low-contrast infrared environments. We propose several novel enhancements to YOLO-based models, commonly employed in real-time target detection tasks. First, we introduce a multiscale dilated separable fusion (MDSF) block, a flexible plug-in that can replace traditional convolution layers and be inserted at various stages of the network. This module enhances the network’s sensitivity to small targets by leveraging large convolution kernels in conjunction with multiscale decomposition. Next, we design a deep feature fusion (DFF) module and a MDSF-Head based on the MDSF block, and integrate them into YOLO models (v5-v11), resulting in significant performance gains, with mAP@50 values improving by 5.4%–9.6%. Furthermore, we propose the coarse-to-fine spatial and channel reconstruction convolution (C2f_SCConv) module, which effectively fuses shallow spatial features with deep semantic features, boosting detection performance, particularly for occluded and small targets. Additionally, we incorporate the spatial-to-depth (SPD) convolution module and replace the traditional complete intersection over union (CIoU) with efficient-intersection over union (EIoU) to further optimize the model. Experimental results on the forward-looking infrared (FLIR) ADAS dataset demonstrate that our approach outperforms the baseline YOLOv8n, with improvements of 10.9% in mAP@50% and 10.3% in mAP@50-95. On the high-altitude infrared thermal dataset for unmanned aerial vehicle (HIT-UAV)-based object detection dataset, we observe similar improvements, with mAP@50 increasing by 8.1% and mAP@50-95 by 9.7%. These results validate the effectiveness of our proposed method, substantially enhancing detection accuracy, robustness, and adaptability in challenging infrared environments.
期刊介绍:
IEEE Transactions on Geoscience and Remote Sensing (TGRS) is a monthly publication that focuses on the theory, concepts, and techniques of science and engineering as applied to sensing the land, oceans, atmosphere, and space; and the processing, interpretation, and dissemination of this information.