基于动态特征增强和多模态对齐融合的遥感图像高效语义分割

IF 6.5 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neurocomputing Pub Date : 2025-09-16 DOI:10.1016/j.neucom.2025.131555

Wenqian Chen , Wendie Yue , Kai Chang , Hongzhi Wang , Kaijun Tan , Xinyu Liu , Xiaoyi Cao

{"title":"基于动态特征增强和多模态对齐融合的遥感图像高效语义分割","authors":"Wenqian Chen , Wendie Yue , Kai Chang , Hongzhi Wang , Kaijun Tan , Xinyu Liu , Xiaoyi Cao","doi":"10.1016/j.neucom.2025.131555","DOIUrl":null,"url":null,"abstract":"<div><div>Multimodal fusion has shown promising applications in integrating information from different modalities. However, existing multimodal fusion approaches in remote sensing face two main challenges: First, multimodal fusion models relying on Convolutional Neural Networks (CNNs) or Visual Transformers (ViTs) have limitations in terms of remote modeling capabilities and computational complexity, while state-space model (SSM)-based fusion models are prone to feature redundancy due to the use of multiple scanning paths, and similarly suffer from high computational complexity. Second, existing methods do not fully address inter-modal heterogeneity, leading to poor multimodal data fusion. To address these issues, we propose an efficient multimodal fusion network, AFMamba, based on the state-space model (SSM) for semantic segmentation of remote sensing images. Specifically, we design the Efficient Dynamic Visual State Space (EDVSS) module, which enhances the efficiency of the standard Mamba model by dynamically improving local features and reducing channel redundancy. Furthermore, we introduce the Cross Attention Alignment Fusion (CAAFM) module, which combines cross-image attention fusion and channel interaction alignment to effectively improve the accuracy and efficiency of cross-modal feature fusion and mitigate feature inconsistency. Experimental results demonstrate that in multimodal hyperspectral image semantic segmentation, the proposed model reduces computational complexity, measured in GFLOPs, by at least 61 % while maintaining a low parameter count, achieving optimal overall accuracy (OA) of around 92 %, and effectively balancing performance and computational efficiency.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"657 ","pages":"Article 131555"},"PeriodicalIF":6.5000,"publicationDate":"2025-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Efficient semantic segmentation of remote sensing images through dynamic feature enhancement and multimodal alignment fusion\",\"authors\":\"Wenqian Chen , Wendie Yue , Kai Chang , Hongzhi Wang , Kaijun Tan , Xinyu Liu , Xiaoyi Cao\",\"doi\":\"10.1016/j.neucom.2025.131555\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Multimodal fusion has shown promising applications in integrating information from different modalities. However, existing multimodal fusion approaches in remote sensing face two main challenges: First, multimodal fusion models relying on Convolutional Neural Networks (CNNs) or Visual Transformers (ViTs) have limitations in terms of remote modeling capabilities and computational complexity, while state-space model (SSM)-based fusion models are prone to feature redundancy due to the use of multiple scanning paths, and similarly suffer from high computational complexity. Second, existing methods do not fully address inter-modal heterogeneity, leading to poor multimodal data fusion. To address these issues, we propose an efficient multimodal fusion network, AFMamba, based on the state-space model (SSM) for semantic segmentation of remote sensing images. Specifically, we design the Efficient Dynamic Visual State Space (EDVSS) module, which enhances the efficiency of the standard Mamba model by dynamically improving local features and reducing channel redundancy. Furthermore, we introduce the Cross Attention Alignment Fusion (CAAFM) module, which combines cross-image attention fusion and channel interaction alignment to effectively improve the accuracy and efficiency of cross-modal feature fusion and mitigate feature inconsistency. Experimental results demonstrate that in multimodal hyperspectral image semantic segmentation, the proposed model reduces computational complexity, measured in GFLOPs, by at least 61 % while maintaining a low parameter count, achieving optimal overall accuracy (OA) of around 92 %, and effectively balancing performance and computational efficiency.</div></div>\",\"PeriodicalId\":19268,\"journal\":{\"name\":\"Neurocomputing\",\"volume\":\"657 \",\"pages\":\"Article 131555\"},\"PeriodicalIF\":6.5000,\"publicationDate\":\"2025-09-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Neurocomputing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0925231225022271\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neurocomputing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0925231225022271","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

多模态融合在不同模态的信息融合方面显示出了良好的应用前景。然而，现有的遥感多模态融合方法面临两个主要挑战：首先，依赖卷积神经网络（cnn）或视觉变形器（ViTs）的多模态融合模型在远程建模能力和计算复杂度方面存在局限性，而基于状态空间模型（SSM）的融合模型由于使用多个扫描路径，容易出现特征冗余，计算复杂度也较高。其次，现有方法没有完全解决多模式异质性，导致多模式数据融合效果不佳。为了解决这些问题，我们提出了一种基于状态空间模型（SSM）的高效多模态融合网络AFMamba，用于遥感图像的语义分割。具体而言，我们设计了高效动态视觉状态空间（EDVSS）模块，该模块通过动态改进局部特征和减少信道冗余来提高标准曼巴模型的效率。此外，我们引入了交叉注意对齐融合（CAAFM）模块，该模块将交叉图像注意融合和通道交互对齐相结合，有效提高了交叉模态特征融合的精度和效率，缓解了特征不一致性。实验结果表明，在多模态高光谱图像语义分割中，该模型在保持低参数计数的同时，将计算复杂度（以GFLOPs衡量）降低了至少61%，实现了92%左右的最佳总体精度（OA），有效地平衡了性能和计算效率。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Efficient semantic segmentation of remote sensing images through dynamic feature enhancement and multimodal alignment fusion

Multimodal fusion has shown promising applications in integrating information from different modalities. However, existing multimodal fusion approaches in remote sensing face two main challenges: First, multimodal fusion models relying on Convolutional Neural Networks (CNNs) or Visual Transformers (ViTs) have limitations in terms of remote modeling capabilities and computational complexity, while state-space model (SSM)-based fusion models are prone to feature redundancy due to the use of multiple scanning paths, and similarly suffer from high computational complexity. Second, existing methods do not fully address inter-modal heterogeneity, leading to poor multimodal data fusion. To address these issues, we propose an efficient multimodal fusion network, AFMamba, based on the state-space model (SSM) for semantic segmentation of remote sensing images. Specifically, we design the Efficient Dynamic Visual State Space (EDVSS) module, which enhances the efficiency of the standard Mamba model by dynamically improving local features and reducing channel redundancy. Furthermore, we introduce the Cross Attention Alignment Fusion (CAAFM) module, which combines cross-image attention fusion and channel interaction alignment to effectively improve the accuracy and efficiency of cross-modal feature fusion and mitigate feature inconsistency. Experimental results demonstrate that in multimodal hyperspectral image semantic segmentation, the proposed model reduces computational complexity, measured in GFLOPs, by at least 61 % while maintaining a low parameter count, achieving optimal overall accuracy (OA) of around 92 %, and effectively balancing performance and computational efficiency.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Neurocomputing 工程技术-计算机：人工智能

CiteScore

13.10

自引率

10.00%

发文量

1382

审稿时长

70 days

期刊介绍： Neurocomputing publishes articles describing recent fundamental contributions in the field of neurocomputing. Neurocomputing theory, practice and applications are the essential topics being covered.