基于视觉曼巴的多粒度表示学习红外小目标检测

IF 8.6 Q1 REMOTE SENSING

International journal of applied earth observation and geoinformation : ITC journal Pub Date : 2025-07-04 DOI:10.1016/j.jag.2025.104645

Yongji Li , Luping Wang , Shichao Chen

{"title":"基于视觉曼巴的多粒度表示学习红外小目标检测","authors":"Yongji Li , Luping Wang , Shichao Chen","doi":"10.1016/j.jag.2025.104645","DOIUrl":null,"url":null,"abstract":"<div><div>Heterogeneous environments and low Signal-to-Clutter Ratio (SCR) pose a challenge for Infrared Small Target Detection (IRSTD). Convolutional Neural Network (CNN) is constrained by the global view. Transformer with quadratic computational complexity struggles for local feature refinement. Inspired by the quad-directional scanning State Space Model (SSM) with linear complexity for long-range modeling, this research reconceptualizes the spatial and structural information of small targets in IR images. Multi-granularity features and long-range dependency of small targets are considered simultaneously. Specifically, we tailor a nested structure with cross-fertilization of global and local information. Each layer of the top-level pyramid network embeds a tiny well-configured contextual pyramid block to extract fine-grained features of small targets. The following Mamba module restructures the feature maps to derive coarse-grained features of “visual sentences”. The fusion of contextual information and local feature achieves precise localization of small targets. Furthermore, we propose the Asymmetric Convolution (AConv) for substituting the Depthwise Convolution (DWConv) in the Visual State Space (VSS) module and the regular convolution in each lateral connection of the nested pyramid network to alleviate the parameters and computation. Both qualitative and quantitative experiments demonstrate that our proposed model outperforms 12 recent baseline methods on two public datasets.</div></div>","PeriodicalId":73423,"journal":{"name":"International journal of applied earth observation and geoinformation : ITC journal","volume":"142 ","pages":"Article 104645"},"PeriodicalIF":8.6000,"publicationDate":"2025-07-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Multi-granularity representation learning with vision Mamba for infrared small target detection\",\"authors\":\"Yongji Li , Luping Wang , Shichao Chen\",\"doi\":\"10.1016/j.jag.2025.104645\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Heterogeneous environments and low Signal-to-Clutter Ratio (SCR) pose a challenge for Infrared Small Target Detection (IRSTD). Convolutional Neural Network (CNN) is constrained by the global view. Transformer with quadratic computational complexity struggles for local feature refinement. Inspired by the quad-directional scanning State Space Model (SSM) with linear complexity for long-range modeling, this research reconceptualizes the spatial and structural information of small targets in IR images. Multi-granularity features and long-range dependency of small targets are considered simultaneously. Specifically, we tailor a nested structure with cross-fertilization of global and local information. Each layer of the top-level pyramid network embeds a tiny well-configured contextual pyramid block to extract fine-grained features of small targets. The following Mamba module restructures the feature maps to derive coarse-grained features of “visual sentences”. The fusion of contextual information and local feature achieves precise localization of small targets. Furthermore, we propose the Asymmetric Convolution (AConv) for substituting the Depthwise Convolution (DWConv) in the Visual State Space (VSS) module and the regular convolution in each lateral connection of the nested pyramid network to alleviate the parameters and computation. Both qualitative and quantitative experiments demonstrate that our proposed model outperforms 12 recent baseline methods on two public datasets.</div></div>\",\"PeriodicalId\":73423,\"journal\":{\"name\":\"International journal of applied earth observation and geoinformation : ITC journal\",\"volume\":\"142 \",\"pages\":\"Article 104645\"},\"PeriodicalIF\":8.6000,\"publicationDate\":\"2025-07-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International journal of applied earth observation and geoinformation : ITC journal\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1569843225002924\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"REMOTE SENSING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International journal of applied earth observation and geoinformation : ITC journal","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1569843225002924","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"REMOTE SENSING","Score":null,"Total":0}

引用次数: 0

摘要

异构环境和低信杂比对红外小目标检测提出了挑战。卷积神经网络（CNN）受全局视图的约束。具有二次计算复杂度的变压器为局部特征的细化而奋斗。受线性复杂度的四向扫描状态空间模型（SSM）的启发，本研究对红外图像中小目标的空间和结构信息进行了重新定义。同时考虑了小目标的多粒度特性和远程依赖性。具体地说，我们定制了一个嵌套结构，其中包含了全局和局部信息的交叉受精。顶层金字塔网络的每一层都嵌入了一个配置良好的小上下文金字塔块，以提取小目标的细粒度特征。下面的Mamba模块重构了特征映射，以派生“视觉句子”的粗粒度特征。上下文信息与局部特征的融合实现了小目标的精确定位。此外，我们提出了非对称卷积（AConv）来代替视觉状态空间（VSS）模块中的深度卷积（DWConv）和嵌套金字塔网络各侧连接中的规则卷积，以减轻参数和计算量。定性和定量实验都表明，我们提出的模型在两个公共数据集上优于最近的12种基线方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Multi-granularity representation learning with vision Mamba for infrared small target detection

Heterogeneous environments and low Signal-to-Clutter Ratio (SCR) pose a challenge for Infrared Small Target Detection (IRSTD). Convolutional Neural Network (CNN) is constrained by the global view. Transformer with quadratic computational complexity struggles for local feature refinement. Inspired by the quad-directional scanning State Space Model (SSM) with linear complexity for long-range modeling, this research reconceptualizes the spatial and structural information of small targets in IR images. Multi-granularity features and long-range dependency of small targets are considered simultaneously. Specifically, we tailor a nested structure with cross-fertilization of global and local information. Each layer of the top-level pyramid network embeds a tiny well-configured contextual pyramid block to extract fine-grained features of small targets. The following Mamba module restructures the feature maps to derive coarse-grained features of “visual sentences”. The fusion of contextual information and local feature achieves precise localization of small targets. Furthermore, we propose the Asymmetric Convolution (AConv) for substituting the Depthwise Convolution (DWConv) in the Visual State Space (VSS) module and the regular convolution in each lateral connection of the nested pyramid network to alleviate the parameters and computation. Both qualitative and quantitative experiments demonstrate that our proposed model outperforms 12 recent baseline methods on two public datasets.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

International journal of applied earth observation and geoinformation : ITC journal Global and Planetary Change, Management, Monitoring, Policy and Law, Earth-Surface Processes, Computers in Earth Sciences

CiteScore

12.00

自引率

0.00%

发文量

审稿时长

77 days

期刊介绍： The International Journal of Applied Earth Observation and Geoinformation publishes original papers that utilize earth observation data for natural resource and environmental inventory and management. These data primarily originate from remote sensing platforms, including satellites and aircraft, supplemented by surface and subsurface measurements. Addressing natural resources such as forests, agricultural land, soils, and water, as well as environmental concerns like biodiversity, land degradation, and hazards, the journal explores conceptual and data-driven approaches. It covers geoinformation themes like capturing, databasing, visualization, interpretation, data quality, and spatial uncertainty.