Huihui Li;Huajian Pan;Xiaoyong Liu;Jinchang Ren;Zhiguo Du;Jingjing Cao
{"title":"GLVMamba: A Global–Local Visual State-Space Model for Remote Sensing Image Segmentation","authors":"Huihui Li;Huajian Pan;Xiaoyong Liu;Jinchang Ren;Zhiguo Du;Jingjing Cao","doi":"10.1109/TGRS.2025.3572127","DOIUrl":null,"url":null,"abstract":"Semantic segmentation of remote sensing images (RSIs) has significant advances with the adoption of deep neural networks, taking the advantages of convolutional neural networks (CNNs) in local feature extraction with transformers in global information modeling. However, due to the limitations of CNNs in long-range modeling capabilities and the computational complexity constraints of transformers, remote sensing (RS) semantic segmentation still faces issues such as serious holes, rough edge segmentation, and false and even missed detections caused by the light, shadow, and other factors. To address these issues, we propose a visual state-space (VSS) model called GLVMamba, which uses CNNs as the encoder and the proposed global-local VSS (GLVSS) block as the core decoder. Specifically, the GLVSS block introduces locality forward feedback and shift window mechanism to addresses the deficiency of insufficient modeling of neighboring pixel dependencies of Mamba, which enhances the integration of global and local context during feature reconstruction, boosts object perception capabilities of the model, and effectively refines edge contours. In addition, the scale-aware pyramid pooling (SCPP) module is proposed to fully merge the features from various scales and adaptively fuse and extract the distinguishing features to mitigate the holes and false detections. The GLVMamba effectively captures global-local semantic information and multiscale feature through the GLVSS block and the SCPP module, achieving efficient and accurate RS semantic segmentation. Extensive experiments on two widely used datasets have effectively demonstrated the superiority of our proposed method over the other state-of-the-art methods. The code will be available at <uri>https://github.com/Tokisakiwlp/GLVMamba</uri>","PeriodicalId":13213,"journal":{"name":"IEEE Transactions on Geoscience and Remote Sensing","volume":"63 ","pages":"1-15"},"PeriodicalIF":8.6000,"publicationDate":"2025-03-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Geoscience and Remote Sensing","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/11014226/","RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0
Abstract
Semantic segmentation of remote sensing images (RSIs) has significant advances with the adoption of deep neural networks, taking the advantages of convolutional neural networks (CNNs) in local feature extraction with transformers in global information modeling. However, due to the limitations of CNNs in long-range modeling capabilities and the computational complexity constraints of transformers, remote sensing (RS) semantic segmentation still faces issues such as serious holes, rough edge segmentation, and false and even missed detections caused by the light, shadow, and other factors. To address these issues, we propose a visual state-space (VSS) model called GLVMamba, which uses CNNs as the encoder and the proposed global-local VSS (GLVSS) block as the core decoder. Specifically, the GLVSS block introduces locality forward feedback and shift window mechanism to addresses the deficiency of insufficient modeling of neighboring pixel dependencies of Mamba, which enhances the integration of global and local context during feature reconstruction, boosts object perception capabilities of the model, and effectively refines edge contours. In addition, the scale-aware pyramid pooling (SCPP) module is proposed to fully merge the features from various scales and adaptively fuse and extract the distinguishing features to mitigate the holes and false detections. The GLVMamba effectively captures global-local semantic information and multiscale feature through the GLVSS block and the SCPP module, achieving efficient and accurate RS semantic segmentation. Extensive experiments on two widely used datasets have effectively demonstrated the superiority of our proposed method over the other state-of-the-art methods. The code will be available at https://github.com/Tokisakiwlp/GLVMamba
期刊介绍:
IEEE Transactions on Geoscience and Remote Sensing (TGRS) is a monthly publication that focuses on the theory, concepts, and techniques of science and engineering as applied to sensing the land, oceans, atmosphere, and space; and the processing, interpretation, and dissemination of this information.