Shichao Zhou;Zekai Zhang;Yingrui Zhao;Wenzheng Wang;Zhuowei Wang
{"title":"基于动态多维卷积的单帧红外小目标检测","authors":"Shichao Zhou;Zekai Zhang;Yingrui Zhao;Wenzheng Wang;Zhuowei Wang","doi":"10.1109/LGRS.2025.3563588","DOIUrl":null,"url":null,"abstract":"Mainly resulting from remote imaging, the target of interest in infrared imagery tends to occupy very few pixels with faint radiation value. The absence of discriminative spatial features of infrared small targets challenges traditional single-frame detectors that rely on handcrafted filter engineering to amplify local contrast. Recently, emerging deep convolutional network (DCN)-based detectors use elaborate multiscale spatial contexts representation to “semantically reason” the small and dim infrared target in pixel level. However, the multiple spatial convolution-downsampling operation adopted by such leading methods could cause the loss of target appearance information during the initial feature encoding stage. To further enhance the low-level feature representation capacity, we advocate the insight of traditional matching filter and propose a novel pixel-adaptive convolution kernel modulated by multidimensional contexts (i.e., dynamic multidimensional convolution, DMConv). Precisely, the DMConv is refined by three collaborative and indispensable attention functions that focus on spatial layout, channel, and kernel number of convolution kernel, respectively, so as to effectively mine, highlight, and enrich fine-grained spatial features with moderate computational burden. Extensive experiments conducted on two real-world infrared single-frame image datasets, i.e., SIRST and Infrared Small Target Detection (IRSTD)-1k, favorably demonstrate the effectiveness of the proposed method and obtain consistent performance improvements over other state-of-the-art (SOTA) detectors.","PeriodicalId":91017,"journal":{"name":"IEEE geoscience and remote sensing letters : a publication of the IEEE Geoscience and Remote Sensing Society","volume":"22 ","pages":"1-5"},"PeriodicalIF":0.0000,"publicationDate":"2025-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Single-Frame Infrared Small Target Detection With Dynamic Multidimensional Convolution\",\"authors\":\"Shichao Zhou;Zekai Zhang;Yingrui Zhao;Wenzheng Wang;Zhuowei Wang\",\"doi\":\"10.1109/LGRS.2025.3563588\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Mainly resulting from remote imaging, the target of interest in infrared imagery tends to occupy very few pixels with faint radiation value. The absence of discriminative spatial features of infrared small targets challenges traditional single-frame detectors that rely on handcrafted filter engineering to amplify local contrast. Recently, emerging deep convolutional network (DCN)-based detectors use elaborate multiscale spatial contexts representation to “semantically reason” the small and dim infrared target in pixel level. However, the multiple spatial convolution-downsampling operation adopted by such leading methods could cause the loss of target appearance information during the initial feature encoding stage. To further enhance the low-level feature representation capacity, we advocate the insight of traditional matching filter and propose a novel pixel-adaptive convolution kernel modulated by multidimensional contexts (i.e., dynamic multidimensional convolution, DMConv). Precisely, the DMConv is refined by three collaborative and indispensable attention functions that focus on spatial layout, channel, and kernel number of convolution kernel, respectively, so as to effectively mine, highlight, and enrich fine-grained spatial features with moderate computational burden. Extensive experiments conducted on two real-world infrared single-frame image datasets, i.e., SIRST and Infrared Small Target Detection (IRSTD)-1k, favorably demonstrate the effectiveness of the proposed method and obtain consistent performance improvements over other state-of-the-art (SOTA) detectors.\",\"PeriodicalId\":91017,\"journal\":{\"name\":\"IEEE geoscience and remote sensing letters : a publication of the IEEE Geoscience and Remote Sensing Society\",\"volume\":\"22 \",\"pages\":\"1-5\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2025-04-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE geoscience and remote sensing letters : a publication of the IEEE Geoscience and Remote Sensing Society\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10974993/\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE geoscience and remote sensing letters : a publication of the IEEE Geoscience and Remote Sensing Society","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10974993/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Single-Frame Infrared Small Target Detection With Dynamic Multidimensional Convolution
Mainly resulting from remote imaging, the target of interest in infrared imagery tends to occupy very few pixels with faint radiation value. The absence of discriminative spatial features of infrared small targets challenges traditional single-frame detectors that rely on handcrafted filter engineering to amplify local contrast. Recently, emerging deep convolutional network (DCN)-based detectors use elaborate multiscale spatial contexts representation to “semantically reason” the small and dim infrared target in pixel level. However, the multiple spatial convolution-downsampling operation adopted by such leading methods could cause the loss of target appearance information during the initial feature encoding stage. To further enhance the low-level feature representation capacity, we advocate the insight of traditional matching filter and propose a novel pixel-adaptive convolution kernel modulated by multidimensional contexts (i.e., dynamic multidimensional convolution, DMConv). Precisely, the DMConv is refined by three collaborative and indispensable attention functions that focus on spatial layout, channel, and kernel number of convolution kernel, respectively, so as to effectively mine, highlight, and enrich fine-grained spatial features with moderate computational burden. Extensive experiments conducted on two real-world infrared single-frame image datasets, i.e., SIRST and Infrared Small Target Detection (IRSTD)-1k, favorably demonstrate the effectiveness of the proposed method and obtain consistent performance improvements over other state-of-the-art (SOTA) detectors.