{"title":"SpectMamba: Remote sensing change detection network integrating frequency and visual state space model","authors":"Zhiwei Dong, Dapeng Cheng, Jinjiang Li","doi":"10.1016/j.eswa.2025.127902","DOIUrl":null,"url":null,"abstract":"<div><div>In recent years, the fusion of Convolutional Neural Network (CNNs) and Transformer models, which can simultaneously leverage the former’s efficiency in local feature extraction and the latter’s advantage in capturing long-range dependencies, has achieved complementary strengths and demonstrated superior modeling potential. However, some of the high-frequency subtle changes and periodic structural changes (e.g., regularly arranged clusters of reconstructed buildings) in multispectral remote sensing images are often difficult to detect in the spatial domain; at the same time, the high computational complexity of the Transformer model restricts its practical application. Recently, the state-space model-based Mamba architecture has performed well in the RSCD task, efficiently learning image global information with linear complexity. Based on this, this study hypothesizes that a strategy combining spectral layers with visual state space (VSS) modules can more efficiently parse these challenges in dense prediction tasks. Specifically, we propose the frontier strategy of using a spectral layer for the initial layer and a VSS layer for the deeper layer and verify its effectiveness through extensive experiments. At the same time, we identify and optimize the limitations of VSS in independently processing the high-frequency information output from the spectral layer, and develop Conv-VSS. These techniques are integrated and extended into a network called SpectMamba, which fuses the spectral layer and Conv-VSS to more appropriately capture feature representations by analyzing the feature images in both the frequency domain and spatial features while avoiding the complexity associated with high-dimensional matrix operations in self-attention. Extensive experimental results on three publicly available datasets show that SpectMamba significantly outperforms existing techniques on several performance metrics.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"287 ","pages":"Article 127902"},"PeriodicalIF":7.5000,"publicationDate":"2025-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Expert Systems with Applications","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0957417425015246","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
In recent years, the fusion of Convolutional Neural Network (CNNs) and Transformer models, which can simultaneously leverage the former’s efficiency in local feature extraction and the latter’s advantage in capturing long-range dependencies, has achieved complementary strengths and demonstrated superior modeling potential. However, some of the high-frequency subtle changes and periodic structural changes (e.g., regularly arranged clusters of reconstructed buildings) in multispectral remote sensing images are often difficult to detect in the spatial domain; at the same time, the high computational complexity of the Transformer model restricts its practical application. Recently, the state-space model-based Mamba architecture has performed well in the RSCD task, efficiently learning image global information with linear complexity. Based on this, this study hypothesizes that a strategy combining spectral layers with visual state space (VSS) modules can more efficiently parse these challenges in dense prediction tasks. Specifically, we propose the frontier strategy of using a spectral layer for the initial layer and a VSS layer for the deeper layer and verify its effectiveness through extensive experiments. At the same time, we identify and optimize the limitations of VSS in independently processing the high-frequency information output from the spectral layer, and develop Conv-VSS. These techniques are integrated and extended into a network called SpectMamba, which fuses the spectral layer and Conv-VSS to more appropriately capture feature representations by analyzing the feature images in both the frequency domain and spatial features while avoiding the complexity associated with high-dimensional matrix operations in self-attention. Extensive experimental results on three publicly available datasets show that SpectMamba significantly outperforms existing techniques on several performance metrics.
期刊介绍:
Expert Systems With Applications is an international journal dedicated to the exchange of information on expert and intelligent systems used globally in industry, government, and universities. The journal emphasizes original papers covering the design, development, testing, implementation, and management of these systems, offering practical guidelines. It spans various sectors such as finance, engineering, marketing, law, project management, information management, medicine, and more. The journal also welcomes papers on multi-agent systems, knowledge management, neural networks, knowledge discovery, data mining, and other related areas, excluding applications to military/defense systems.