基于视觉曼巴的多粒度表示学习红外小目标检测

IF 8.6 Q1 REMOTE SENSING
Yongji Li , Luping Wang , Shichao Chen
{"title":"基于视觉曼巴的多粒度表示学习红外小目标检测","authors":"Yongji Li ,&nbsp;Luping Wang ,&nbsp;Shichao Chen","doi":"10.1016/j.jag.2025.104645","DOIUrl":null,"url":null,"abstract":"<div><div>Heterogeneous environments and low Signal-to-Clutter Ratio (SCR) pose a challenge for Infrared Small Target Detection (IRSTD). Convolutional Neural Network (CNN) is constrained by the global view. Transformer with quadratic computational complexity struggles for local feature refinement. Inspired by the quad-directional scanning State Space Model (SSM) with linear complexity for long-range modeling, this research reconceptualizes the spatial and structural information of small targets in IR images. Multi-granularity features and long-range dependency of small targets are considered simultaneously. Specifically, we tailor a nested structure with cross-fertilization of global and local information. Each layer of the top-level pyramid network embeds a tiny well-configured contextual pyramid block to extract fine-grained features of small targets. The following Mamba module restructures the feature maps to derive coarse-grained features of “visual sentences”. The fusion of contextual information and local feature achieves precise localization of small targets. Furthermore, we propose the Asymmetric Convolution (AConv) for substituting the Depthwise Convolution (DWConv) in the Visual State Space (VSS) module and the regular convolution in each lateral connection of the nested pyramid network to alleviate the parameters and computation. Both qualitative and quantitative experiments demonstrate that our proposed model outperforms 12 recent baseline methods on two public datasets.</div></div>","PeriodicalId":73423,"journal":{"name":"International journal of applied earth observation and geoinformation : ITC journal","volume":"142 ","pages":"Article 104645"},"PeriodicalIF":8.6000,"publicationDate":"2025-07-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Multi-granularity representation learning with vision Mamba for infrared small target detection\",\"authors\":\"Yongji Li ,&nbsp;Luping Wang ,&nbsp;Shichao Chen\",\"doi\":\"10.1016/j.jag.2025.104645\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Heterogeneous environments and low Signal-to-Clutter Ratio (SCR) pose a challenge for Infrared Small Target Detection (IRSTD). Convolutional Neural Network (CNN) is constrained by the global view. Transformer with quadratic computational complexity struggles for local feature refinement. Inspired by the quad-directional scanning State Space Model (SSM) with linear complexity for long-range modeling, this research reconceptualizes the spatial and structural information of small targets in IR images. Multi-granularity features and long-range dependency of small targets are considered simultaneously. Specifically, we tailor a nested structure with cross-fertilization of global and local information. Each layer of the top-level pyramid network embeds a tiny well-configured contextual pyramid block to extract fine-grained features of small targets. The following Mamba module restructures the feature maps to derive coarse-grained features of “visual sentences”. The fusion of contextual information and local feature achieves precise localization of small targets. Furthermore, we propose the Asymmetric Convolution (AConv) for substituting the Depthwise Convolution (DWConv) in the Visual State Space (VSS) module and the regular convolution in each lateral connection of the nested pyramid network to alleviate the parameters and computation. Both qualitative and quantitative experiments demonstrate that our proposed model outperforms 12 recent baseline methods on two public datasets.</div></div>\",\"PeriodicalId\":73423,\"journal\":{\"name\":\"International journal of applied earth observation and geoinformation : ITC journal\",\"volume\":\"142 \",\"pages\":\"Article 104645\"},\"PeriodicalIF\":8.6000,\"publicationDate\":\"2025-07-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International journal of applied earth observation and geoinformation : ITC journal\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1569843225002924\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"REMOTE SENSING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International journal of applied earth observation and geoinformation : ITC journal","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1569843225002924","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"REMOTE SENSING","Score":null,"Total":0}
引用次数: 0

摘要

异构环境和低信杂比对红外小目标检测提出了挑战。卷积神经网络(CNN)受全局视图的约束。具有二次计算复杂度的变压器为局部特征的细化而奋斗。受线性复杂度的四向扫描状态空间模型(SSM)的启发,本研究对红外图像中小目标的空间和结构信息进行了重新定义。同时考虑了小目标的多粒度特性和远程依赖性。具体地说,我们定制了一个嵌套结构,其中包含了全局和局部信息的交叉受精。顶层金字塔网络的每一层都嵌入了一个配置良好的小上下文金字塔块,以提取小目标的细粒度特征。下面的Mamba模块重构了特征映射,以派生“视觉句子”的粗粒度特征。上下文信息与局部特征的融合实现了小目标的精确定位。此外,我们提出了非对称卷积(AConv)来代替视觉状态空间(VSS)模块中的深度卷积(DWConv)和嵌套金字塔网络各侧连接中的规则卷积,以减轻参数和计算量。定性和定量实验都表明,我们提出的模型在两个公共数据集上优于最近的12种基线方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Multi-granularity representation learning with vision Mamba for infrared small target detection
Heterogeneous environments and low Signal-to-Clutter Ratio (SCR) pose a challenge for Infrared Small Target Detection (IRSTD). Convolutional Neural Network (CNN) is constrained by the global view. Transformer with quadratic computational complexity struggles for local feature refinement. Inspired by the quad-directional scanning State Space Model (SSM) with linear complexity for long-range modeling, this research reconceptualizes the spatial and structural information of small targets in IR images. Multi-granularity features and long-range dependency of small targets are considered simultaneously. Specifically, we tailor a nested structure with cross-fertilization of global and local information. Each layer of the top-level pyramid network embeds a tiny well-configured contextual pyramid block to extract fine-grained features of small targets. The following Mamba module restructures the feature maps to derive coarse-grained features of “visual sentences”. The fusion of contextual information and local feature achieves precise localization of small targets. Furthermore, we propose the Asymmetric Convolution (AConv) for substituting the Depthwise Convolution (DWConv) in the Visual State Space (VSS) module and the regular convolution in each lateral connection of the nested pyramid network to alleviate the parameters and computation. Both qualitative and quantitative experiments demonstrate that our proposed model outperforms 12 recent baseline methods on two public datasets.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
International journal of applied earth observation and geoinformation : ITC journal
International journal of applied earth observation and geoinformation : ITC journal Global and Planetary Change, Management, Monitoring, Policy and Law, Earth-Surface Processes, Computers in Earth Sciences
CiteScore
12.00
自引率
0.00%
发文量
0
审稿时长
77 days
期刊介绍: The International Journal of Applied Earth Observation and Geoinformation publishes original papers that utilize earth observation data for natural resource and environmental inventory and management. These data primarily originate from remote sensing platforms, including satellites and aircraft, supplemented by surface and subsurface measurements. Addressing natural resources such as forests, agricultural land, soils, and water, as well as environmental concerns like biodiversity, land degradation, and hazards, the journal explores conceptual and data-driven approaches. It covers geoinformation themes like capturing, databasing, visualization, interpretation, data quality, and spatial uncertainty.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信