GLMambaNet:用于遥感图像语义分割的基于mamba的局部细节增强解码器

IF 4.2 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Zhengyu Zhu , Xinaoxue Zhang , Xiaobo Zhang , Zixuan Zhao , Feng Chen
{"title":"GLMambaNet:用于遥感图像语义分割的基于mamba的局部细节增强解码器","authors":"Zhengyu Zhu ,&nbsp;Xinaoxue Zhang ,&nbsp;Xiaobo Zhang ,&nbsp;Zixuan Zhao ,&nbsp;Feng Chen","doi":"10.1016/j.imavis.2025.105774","DOIUrl":null,"url":null,"abstract":"<div><div>Accurate semantic segmentation of high-resolution remote sensing imagery is critical for land cover classification, supporting applications ranging from urban infrastructure planning to ecological conservation. In the context of remote sensing, this task is particularly challenging due to the high spatial resolution, spectral complexity, and the presence of small or irregularly shaped objects. Existing methods often struggle to balance global context modeling and local detail preservation—both essential for precise segmentation in complex scenes. This motivates the design of new architectures capable of capturing long-range dependencies while remaining sensitive to fine-grained spatial details, without incurring excessive computational cost. While Transformer architectures effectively model long-range dependencies, their quadratic complexity limits scalability for high-resolution imagery. To address these challenges, we present GLMambaNet, a dual-stream architecture that combines Swin Transformer’s hierarchical encoding with a novel Mamba-based decoder. The framework introduces two core components: the Mamba Global Context Module (MGCM), which leverages state space modeling with channel attention to enhance global–local context integration, and the Local Detail Enhancement Module (LDEM), which improves boundary and texture preservation through gradient-aware convolutions. On the Vaihingen dataset, our model achieves a mean F1-score of 91.82% and mIoU of 85.29%, surpassing CNN- and Transformer-based baselines in capturing fine details such as vehicle edges and shadows. On the Potsdam dataset, it achieves an mIoU of 87.58%, delivering enhanced performance across key classes including buildings, trees, and cars. These results demonstrate that GLMambaNet effectively balances segmentation accuracy and model complexity, providing a strong foundation for practical remote sensing applications.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"163 ","pages":"Article 105774"},"PeriodicalIF":4.2000,"publicationDate":"2025-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"GLMambaNet: Mamba-based decoder with local detail enhancement for semantic segmentation of remote sensing imagery\",\"authors\":\"Zhengyu Zhu ,&nbsp;Xinaoxue Zhang ,&nbsp;Xiaobo Zhang ,&nbsp;Zixuan Zhao ,&nbsp;Feng Chen\",\"doi\":\"10.1016/j.imavis.2025.105774\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Accurate semantic segmentation of high-resolution remote sensing imagery is critical for land cover classification, supporting applications ranging from urban infrastructure planning to ecological conservation. In the context of remote sensing, this task is particularly challenging due to the high spatial resolution, spectral complexity, and the presence of small or irregularly shaped objects. Existing methods often struggle to balance global context modeling and local detail preservation—both essential for precise segmentation in complex scenes. This motivates the design of new architectures capable of capturing long-range dependencies while remaining sensitive to fine-grained spatial details, without incurring excessive computational cost. While Transformer architectures effectively model long-range dependencies, their quadratic complexity limits scalability for high-resolution imagery. To address these challenges, we present GLMambaNet, a dual-stream architecture that combines Swin Transformer’s hierarchical encoding with a novel Mamba-based decoder. The framework introduces two core components: the Mamba Global Context Module (MGCM), which leverages state space modeling with channel attention to enhance global–local context integration, and the Local Detail Enhancement Module (LDEM), which improves boundary and texture preservation through gradient-aware convolutions. On the Vaihingen dataset, our model achieves a mean F1-score of 91.82% and mIoU of 85.29%, surpassing CNN- and Transformer-based baselines in capturing fine details such as vehicle edges and shadows. On the Potsdam dataset, it achieves an mIoU of 87.58%, delivering enhanced performance across key classes including buildings, trees, and cars. These results demonstrate that GLMambaNet effectively balances segmentation accuracy and model complexity, providing a strong foundation for practical remote sensing applications.</div></div>\",\"PeriodicalId\":50374,\"journal\":{\"name\":\"Image and Vision Computing\",\"volume\":\"163 \",\"pages\":\"Article 105774\"},\"PeriodicalIF\":4.2000,\"publicationDate\":\"2025-10-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Image and Vision Computing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0262885625003622\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Image and Vision Computing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0262885625003622","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

摘要

高分辨率遥感图像的准确语义分割对土地覆盖分类至关重要,支持从城市基础设施规划到生态保护的广泛应用。在遥感背景下,由于高空间分辨率、光谱复杂性以及存在小型或不规则形状的物体,这项任务尤其具有挑战性。现有的方法往往难以平衡全局上下文建模和局部细节保存,这两者对于复杂场景的精确分割至关重要。这激发了新架构的设计,能够捕获远程依赖关系,同时保持对细粒度空间细节的敏感,而不会产生过多的计算成本。虽然Transformer架构可以有效地为远程依赖关系建模,但它们的二次复杂度限制了高分辨率图像的可伸缩性。为了应对这些挑战,我们提出了GLMambaNet,这是一种双流架构,将Swin Transformer的分层编码与一种新颖的基于mamba的解码器结合在一起。该框架引入了两个核心组件:Mamba全局上下文模块(MGCM)和局部细节增强模块(LDEM),前者利用状态空间建模和通道关注来增强全局-局部上下文集成,后者通过梯度感知卷积改善边界和纹理保存。在Vaihingen数据集上,我们的模型在捕捉车辆边缘和阴影等细节方面的平均f1得分为91.82%,mIoU为85.29%,超过了基于CNN和transformer的基线。在Potsdam数据集上,它实现了87.58%的mIoU,在包括建筑物、树木和汽车在内的关键类别上提供了增强的性能。这些结果表明,GLMambaNet有效地平衡了分割精度和模型复杂性,为实际遥感应用提供了坚实的基础。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
GLMambaNet: Mamba-based decoder with local detail enhancement for semantic segmentation of remote sensing imagery
Accurate semantic segmentation of high-resolution remote sensing imagery is critical for land cover classification, supporting applications ranging from urban infrastructure planning to ecological conservation. In the context of remote sensing, this task is particularly challenging due to the high spatial resolution, spectral complexity, and the presence of small or irregularly shaped objects. Existing methods often struggle to balance global context modeling and local detail preservation—both essential for precise segmentation in complex scenes. This motivates the design of new architectures capable of capturing long-range dependencies while remaining sensitive to fine-grained spatial details, without incurring excessive computational cost. While Transformer architectures effectively model long-range dependencies, their quadratic complexity limits scalability for high-resolution imagery. To address these challenges, we present GLMambaNet, a dual-stream architecture that combines Swin Transformer’s hierarchical encoding with a novel Mamba-based decoder. The framework introduces two core components: the Mamba Global Context Module (MGCM), which leverages state space modeling with channel attention to enhance global–local context integration, and the Local Detail Enhancement Module (LDEM), which improves boundary and texture preservation through gradient-aware convolutions. On the Vaihingen dataset, our model achieves a mean F1-score of 91.82% and mIoU of 85.29%, surpassing CNN- and Transformer-based baselines in capturing fine details such as vehicle edges and shadows. On the Potsdam dataset, it achieves an mIoU of 87.58%, delivering enhanced performance across key classes including buildings, trees, and cars. These results demonstrate that GLMambaNet effectively balances segmentation accuracy and model complexity, providing a strong foundation for practical remote sensing applications.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Image and Vision Computing
Image and Vision Computing 工程技术-工程:电子与电气
CiteScore
8.50
自引率
8.50%
发文量
143
审稿时长
7.8 months
期刊介绍: Image and Vision Computing has as a primary aim the provision of an effective medium of interchange for the results of high quality theoretical and applied research fundamental to all aspects of image interpretation and computer vision. The journal publishes work that proposes new image interpretation and computer vision methodology or addresses the application of such methods to real world scenes. It seeks to strengthen a deeper understanding in the discipline by encouraging the quantitative comparison and performance evaluation of the proposed methodology. The coverage includes: image interpretation, scene modelling, object recognition and tracking, shape analysis, monitoring and surveillance, active vision and robotic systems, SLAM, biologically-inspired computer vision, motion analysis, stereo vision, document image understanding, character and handwritten text recognition, face and gesture recognition, biometrics, vision-based human-computer interaction, human activity and behavior understanding, data fusion from multiple sensor inputs, image databases.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信