{"title":"红外小目标检测的多尺度通道关注和跨层融合网络","authors":"Shengli Zhou;Tong Liu;Xiaolu Guo;Meibo Lv","doi":"10.1109/LGRS.2025.3580709","DOIUrl":null,"url":null,"abstract":"Infrared small target detection (IRSTD) faces challenges due to limited global perception and feature ambiguity in complex scenarios. To address these issues, we propose a novel multiscale channel attention and cross-layer fusion network (MACFNet). The framework integrates three key innovations: 1)the feature convolution attention transformer (FCAT) addresses limited global perception by combining local features and global contexts to enhance target representation; 2) the efficient channel and spatial attention (ECSA) module resolves feature ambiguity by optimizing discriminative feature weighting; and 3) an enhanced M-UNet architecture incorporates channelwise cross fusion transformer (CCT) modules to enable effective cross-scale semantic alignment. Extensive experiments on the SIRST and NUDT-SIRST datasets demonstrate the state-of-the-art performance, achieving significantly higher IoU of 0.8396 and 0.9346, respectively, surpassing the existing model-driven and data-driven methods while maintaining a real-time capable inference speed of 29.7167 FPS.","PeriodicalId":91017,"journal":{"name":"IEEE geoscience and remote sensing letters : a publication of the IEEE Geoscience and Remote Sensing Society","volume":"22 ","pages":"1-5"},"PeriodicalIF":4.4000,"publicationDate":"2025-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Multiscale Channel Attention and Cross-Layer Fusion Network for Infrared Small Target Detection\",\"authors\":\"Shengli Zhou;Tong Liu;Xiaolu Guo;Meibo Lv\",\"doi\":\"10.1109/LGRS.2025.3580709\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Infrared small target detection (IRSTD) faces challenges due to limited global perception and feature ambiguity in complex scenarios. To address these issues, we propose a novel multiscale channel attention and cross-layer fusion network (MACFNet). The framework integrates three key innovations: 1)the feature convolution attention transformer (FCAT) addresses limited global perception by combining local features and global contexts to enhance target representation; 2) the efficient channel and spatial attention (ECSA) module resolves feature ambiguity by optimizing discriminative feature weighting; and 3) an enhanced M-UNet architecture incorporates channelwise cross fusion transformer (CCT) modules to enable effective cross-scale semantic alignment. Extensive experiments on the SIRST and NUDT-SIRST datasets demonstrate the state-of-the-art performance, achieving significantly higher IoU of 0.8396 and 0.9346, respectively, surpassing the existing model-driven and data-driven methods while maintaining a real-time capable inference speed of 29.7167 FPS.\",\"PeriodicalId\":91017,\"journal\":{\"name\":\"IEEE geoscience and remote sensing letters : a publication of the IEEE Geoscience and Remote Sensing Society\",\"volume\":\"22 \",\"pages\":\"1-5\"},\"PeriodicalIF\":4.4000,\"publicationDate\":\"2025-06-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE geoscience and remote sensing letters : a publication of the IEEE Geoscience and Remote Sensing Society\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/11039771/\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE geoscience and remote sensing letters : a publication of the IEEE Geoscience and Remote Sensing Society","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/11039771/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Multiscale Channel Attention and Cross-Layer Fusion Network for Infrared Small Target Detection
Infrared small target detection (IRSTD) faces challenges due to limited global perception and feature ambiguity in complex scenarios. To address these issues, we propose a novel multiscale channel attention and cross-layer fusion network (MACFNet). The framework integrates three key innovations: 1)the feature convolution attention transformer (FCAT) addresses limited global perception by combining local features and global contexts to enhance target representation; 2) the efficient channel and spatial attention (ECSA) module resolves feature ambiguity by optimizing discriminative feature weighting; and 3) an enhanced M-UNet architecture incorporates channelwise cross fusion transformer (CCT) modules to enable effective cross-scale semantic alignment. Extensive experiments on the SIRST and NUDT-SIRST datasets demonstrate the state-of-the-art performance, achieving significantly higher IoU of 0.8396 and 0.9346, respectively, surpassing the existing model-driven and data-driven methods while maintaining a real-time capable inference speed of 29.7167 FPS.