MLMamba: A Mamba-Based Efficient Network for Multi-Label Remote Sensing Scene Classification

IF 11.1 1区工程技术 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE Transactions on Circuits and Systems for Video Technology Pub Date : 2025-01-29 DOI:10.1109/TCSVT.2025.3535939

Ruiqi Du;Xu Tang;Jingjing Ma;Xiangrong Zhang;Licheng Jiao

{"title":"MLMamba: A Mamba-Based Efficient Network for Multi-Label Remote Sensing Scene Classification","authors":"Ruiqi Du;Xu Tang;Jingjing Ma;Xiangrong Zhang;Licheng Jiao","doi":"10.1109/TCSVT.2025.3535939","DOIUrl":null,"url":null,"abstract":"As a useful remote sensing (RS) scene interpretation technique, multi-label RS scene classification (RSSC) always attracts researchers’ attention and plays an important role in the RS community. To assign multiple semantic labels to a single RS image according to its complex contents, the existing methods focus on learning the valuable visual features and mining the latent semantic relationships from the RS images. This is a feasible and helpful solution. However, they are often associated with high computational costs due to the widespread use of Transformers. To alleviate this problem, we propose a Mamba-based efficient network based on the newly emerged state space model called MLMamba. In addition to the basic feature extractor (convolutional neural network and language model) and classifier (multiple perceptrons), MLMamba consists of two key components: a pyramid Mamba and a feature-guided semantic modeling (FGSM) Mamba. Pyramid Mamba uses multi-scale scanning to establish global relationships within and across different scales, improving MLMamba’s ability to explore RS images. Under the guidance of the obtained visual features, FGSM Mamba establishes associations between different land covers. Combining these two components can deeply mine local features, multi-scale information, and long-range dependencies from RS images and build semantic relationships between different surface covers. These superiorities guarantee that MLMamba can fully understand the complex contents within RS images and accurately determine which categories exist. Furthermore, the simple and effective structure and linear computational complexity of the state space model ensure that pyramid Mamba and FGSM Mamba will not impose too much computational burden on MLMamba. Extensive experiments counted on three benchmark multi-label RSSC data sets validate the effectiveness of MLMamba. The positive results demonstrate that MLMamba achieves state-of-the-art performance, surpassing existing methods in accuracy, model size, and computational efficiency. Our source codes are available at <uri>https://github.com/TangXu-Group/ multilabelRSSC/tree/main/MLMamba</uri>.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 7","pages":"6245-6258"},"PeriodicalIF":11.1000,"publicationDate":"2025-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Circuits and Systems for Video Technology","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10857393/","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

Abstract

As a useful remote sensing (RS) scene interpretation technique, multi-label RS scene classification (RSSC) always attracts researchers’ attention and plays an important role in the RS community. To assign multiple semantic labels to a single RS image according to its complex contents, the existing methods focus on learning the valuable visual features and mining the latent semantic relationships from the RS images. This is a feasible and helpful solution. However, they are often associated with high computational costs due to the widespread use of Transformers. To alleviate this problem, we propose a Mamba-based efficient network based on the newly emerged state space model called MLMamba. In addition to the basic feature extractor (convolutional neural network and language model) and classifier (multiple perceptrons), MLMamba consists of two key components: a pyramid Mamba and a feature-guided semantic modeling (FGSM) Mamba. Pyramid Mamba uses multi-scale scanning to establish global relationships within and across different scales, improving MLMamba’s ability to explore RS images. Under the guidance of the obtained visual features, FGSM Mamba establishes associations between different land covers. Combining these two components can deeply mine local features, multi-scale information, and long-range dependencies from RS images and build semantic relationships between different surface covers. These superiorities guarantee that MLMamba can fully understand the complex contents within RS images and accurately determine which categories exist. Furthermore, the simple and effective structure and linear computational complexity of the state space model ensure that pyramid Mamba and FGSM Mamba will not impose too much computational burden on MLMamba. Extensive experiments counted on three benchmark multi-label RSSC data sets validate the effectiveness of MLMamba. The positive results demonstrate that MLMamba achieves state-of-the-art performance, surpassing existing methods in accuracy, model size, and computational efficiency. Our source codes are available at https://github.com/TangXu-Group/ multilabelRSSC/tree/main/MLMamba.

查看原文本刊更多论文

基于mamba的多标签遥感场景分类高效网络

作为一种有用的遥感场景解译技术，多标签遥感场景分类一直受到研究人员的关注，在遥感领域发挥着重要作用。为了根据遥感图像的复杂内容为其分配多个语义标签，现有方法主要是从遥感图像中学习有价值的视觉特征和挖掘潜在的语义关系。这是一个可行的、有益的解决方案。然而，由于变压器的广泛使用，它们通常伴随着高计算成本。为了解决这个问题，我们提出了一个基于mamba的高效网络，该网络基于新出现的状态空间模型MLMamba。除了基本的特征提取器（卷积神经网络和语言模型）和分类器（多感知器）外，MLMamba还包括两个关键组件：金字塔曼巴和特征引导语义建模（FGSM）曼巴。金字塔曼巴使用多尺度扫描来建立不同尺度内部和之间的全球关系，提高MLMamba探索RS图像的能力。在获得的视觉特征的指导下，FGSM曼巴在不同的土地覆盖之间建立联系。结合这两个组件可以深度挖掘遥感图像的局部特征、多尺度信息和远程依赖关系，并构建不同地表覆盖之间的语义关系。这些优势保证了MLMamba可以充分理解RS图像中的复杂内容，准确判断哪些类别存在。此外，状态空间模型简单有效的结构和线性计算复杂度保证了金字塔曼巴和FGSM曼巴不会给MLMamba带来过多的计算负担。在三个基准多标签RSSC数据集上进行了大量实验，验证了MLMamba的有效性。积极的结果表明，MLMamba达到了最先进的性能，在精度、模型大小和计算效率方面超越了现有的方法。我们的源代码可在https://github.com/TangXu-Group/ multilabelRSSC/tree/main/MLMamba。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Circuits and Systems for Video Technology 工程技术-工程：电子与电气

CiteScore

13.80

自引率

27.40%

发文量

660

审稿时长

5 months

期刊介绍： The IEEE Transactions on Circuits and Systems for Video Technology (TCSVT) is dedicated to covering all aspects of video technologies from a circuits and systems perspective. We encourage submissions of general, theoretical, and application-oriented papers related to image and video acquisition, representation, presentation, and display. Additionally, we welcome contributions in areas such as processing, filtering, and transforms; analysis and synthesis; learning and understanding; compression, transmission, communication, and networking; as well as storage, retrieval, indexing, and search. Furthermore, papers focusing on hardware and software design and implementation are highly valued. Join us in advancing the field of video technology through innovative research and insights.