基于视觉状态空间模型的SAR飞机检测改进算法

IF 1.5 4区计算机科学 Q4 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IET Computer Vision Pub Date : 2025-07-07 DOI:10.1049/cvi2.70032

Yaqiong Wang, Jing Zhang, Yipei Wang, Shiyu Hu, Baoguo Shen, Zhenhua Hou, Wanting Zhou

{"title":"基于视觉状态空间模型的SAR飞机检测改进算法","authors":"Yaqiong Wang, Jing Zhang, Yipei Wang, Shiyu Hu, Baoguo Shen, Zhenhua Hou, Wanting Zhou","doi":"10.1049/cvi2.70032","DOIUrl":null,"url":null,"abstract":"<p>In recent years, the development of deep learning algorithms has significantly advanced the application of synthetic aperture radar (SAR) aircraft detection in remote sensing and military fields. However, existing methods face a dual dilemma: CNN-based models suffer from insufficient detection accuracy due to limitations in local receptive fields, whereas Transformer-based models improve accuracy by leveraging attention mechanisms but incur significant computational overhead due to their quadratic complexity. This imbalance between accuracy and efficiency severely limits the development of SAR aircraft detection. To address this problem, this paper propose a novel neural network based on state space models (SSM), termed the Mamba SAR detection network (MSAD). Specifically, we design a feature encoding module, MEBlock, that integrates CNN with SSM to enhance global feature modelling capabilities. Meanwhile, the linear computational complexity brought by SSM is superior to that of Transformer architectures, achieving a reduction in computational overhead. Additionally, we propose a context-aware feature fusion module (CAFF) that combines attention mechanisms to achieve adaptive fusion of multi-scale features. Lastly, a lightweight parameter-shared detection head (PSHead) is utilised to effectively reduce redundant parameters through implicit feature interaction. Experiments on the SAR-AirCraft-v1.0 and SADD datasets show that MSAD achieves higher accuracy than existing algorithms, whereas its GFLOPs are 2.7 times smaller than those of the Transformer architecture RT-DETR. These results validate the core role of SSM as an accuracy-efficiency balancer, reflecting MSAD's perceptual capability and performance in SAR aircraft detection in complex environments.</p>","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"19 1","pages":""},"PeriodicalIF":1.5000,"publicationDate":"2025-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.70032","citationCount":"0","resultStr":"{\"title\":\"Improved SAR Aircraft Detection Algorithm Based on Visual State Space Models\",\"authors\":\"Yaqiong Wang, Jing Zhang, Yipei Wang, Shiyu Hu, Baoguo Shen, Zhenhua Hou, Wanting Zhou\",\"doi\":\"10.1049/cvi2.70032\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>In recent years, the development of deep learning algorithms has significantly advanced the application of synthetic aperture radar (SAR) aircraft detection in remote sensing and military fields. However, existing methods face a dual dilemma: CNN-based models suffer from insufficient detection accuracy due to limitations in local receptive fields, whereas Transformer-based models improve accuracy by leveraging attention mechanisms but incur significant computational overhead due to their quadratic complexity. This imbalance between accuracy and efficiency severely limits the development of SAR aircraft detection. To address this problem, this paper propose a novel neural network based on state space models (SSM), termed the Mamba SAR detection network (MSAD). Specifically, we design a feature encoding module, MEBlock, that integrates CNN with SSM to enhance global feature modelling capabilities. Meanwhile, the linear computational complexity brought by SSM is superior to that of Transformer architectures, achieving a reduction in computational overhead. Additionally, we propose a context-aware feature fusion module (CAFF) that combines attention mechanisms to achieve adaptive fusion of multi-scale features. Lastly, a lightweight parameter-shared detection head (PSHead) is utilised to effectively reduce redundant parameters through implicit feature interaction. Experiments on the SAR-AirCraft-v1.0 and SADD datasets show that MSAD achieves higher accuracy than existing algorithms, whereas its GFLOPs are 2.7 times smaller than those of the Transformer architecture RT-DETR. These results validate the core role of SSM as an accuracy-efficiency balancer, reflecting MSAD's perceptual capability and performance in SAR aircraft detection in complex environments.</p>\",\"PeriodicalId\":56304,\"journal\":{\"name\":\"IET Computer Vision\",\"volume\":\"19 1\",\"pages\":\"\"},\"PeriodicalIF\":1.5000,\"publicationDate\":\"2025-07-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.70032\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IET Computer Vision\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1049/cvi2.70032\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IET Computer Vision","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1049/cvi2.70032","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

近年来，深度学习算法的发展极大地推动了合成孔径雷达（SAR）飞机探测在遥感和军事领域的应用。然而，现有的方法面临着双重困境：基于cnn的模型由于局部接受域的限制而导致检测精度不足，而基于transformer的模型通过利用注意机制来提高准确性，但由于其二次复杂度而导致大量的计算开销。这种精度与效率的不平衡严重限制了SAR飞机探测的发展。为了解决这一问题，本文提出了一种新的基于状态空间模型（SSM）的神经网络，称为曼巴SAR检测网络（MSAD）。具体来说，我们设计了一个特征编码模块MEBlock，该模块集成了CNN和SSM，以增强全局特征建模能力。同时，SSM带来的线性计算复杂度优于Transformer架构，实现了计算开销的降低。此外，我们提出了一种结合注意机制的上下文感知特征融合模块（CAFF），以实现多尺度特征的自适应融合。最后，利用轻量级参数共享检测头（PSHead），通过隐式特征交互有效减少冗余参数。在SAR-AirCraft-v1.0和SADD数据集上的实验表明，MSAD的精度高于现有算法，而GFLOPs比Transformer架构RT-DETR的GFLOPs小2.7倍。这些结果验证了SSM作为精度-效率平衡器的核心作用，反映了MSAD在复杂环境下的SAR飞机检测中的感知能力和性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

Improved SAR Aircraft Detection Algorithm Based on Visual State Space Models

查看原文本刊更多论文

Improved SAR Aircraft Detection Algorithm Based on Visual State Space Models

In recent years, the development of deep learning algorithms has significantly advanced the application of synthetic aperture radar (SAR) aircraft detection in remote sensing and military fields. However, existing methods face a dual dilemma: CNN-based models suffer from insufficient detection accuracy due to limitations in local receptive fields, whereas Transformer-based models improve accuracy by leveraging attention mechanisms but incur significant computational overhead due to their quadratic complexity. This imbalance between accuracy and efficiency severely limits the development of SAR aircraft detection. To address this problem, this paper propose a novel neural network based on state space models (SSM), termed the Mamba SAR detection network (MSAD). Specifically, we design a feature encoding module, MEBlock, that integrates CNN with SSM to enhance global feature modelling capabilities. Meanwhile, the linear computational complexity brought by SSM is superior to that of Transformer architectures, achieving a reduction in computational overhead. Additionally, we propose a context-aware feature fusion module (CAFF) that combines attention mechanisms to achieve adaptive fusion of multi-scale features. Lastly, a lightweight parameter-shared detection head (PSHead) is utilised to effectively reduce redundant parameters through implicit feature interaction. Experiments on the SAR-AirCraft-v1.0 and SADD datasets show that MSAD achieves higher accuracy than existing algorithms, whereas its GFLOPs are 2.7 times smaller than those of the Transformer architecture RT-DETR. These results validate the core role of SSM as an accuracy-efficiency balancer, reflecting MSAD's perceptual capability and performance in SAR aircraft detection in complex environments.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IET Computer Vision 工程技术-工程：电子与电气

CiteScore

3.30

自引率

11.80%

发文量

审稿时长

3.4 months

期刊介绍： IET Computer Vision seeks original research papers in a wide range of areas of computer vision. The vision of the journal is to publish the highest quality research work that is relevant and topical to the field, but not forgetting those works that aim to introduce new horizons and set the agenda for future avenues of research in computer vision. IET Computer Vision welcomes submissions on the following topics: Biologically and perceptually motivated approaches to low level vision (feature detection, etc.); Perceptual grouping and organisation Representation, analysis and matching of 2D and 3D shape Shape-from-X Object recognition Image understanding Learning with visual inputs Motion analysis and object tracking Multiview scene analysis Cognitive approaches in low, mid and high level vision Control in visual systems Colour, reflectance and light Statistical and probabilistic models Face and gesture Surveillance Biometrics and security Robotics Vehicle guidance Automatic model aquisition Medical image analysis and understanding Aerial scene analysis and remote sensing Deep learning models in computer vision Both methodological and applications orientated papers are welcome. Manuscripts submitted are expected to include a detailed and analytical review of the literature and state-of-the-art exposition of the original proposed research and its methodology, its thorough experimental evaluation, and last but not least, comparative evaluation against relevant and state-of-the-art methods. Submissions not abiding by these minimum requirements may be returned to authors without being sent to review. Special Issues Current Call for Papers: Computer Vision for Smart Cameras and Camera Networks - https://digital-library.theiet.org/files/IET_CVI_SC.pdf Computer Vision for the Creative Industries - https://digital-library.theiet.org/files/IET_CVI_CVCI.pdf