Enhanced Swin Transformer and Edge Spatial Attention for Remote Sensing Image Semantic Segmentation

IF 3.2 2区工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE Signal Processing Letters Pub Date : 2025-03-12 DOI:10.1109/LSP.2025.3550858

Fuxiang Liu;Zhiqiang Hu;Lei Li;Hanlu Li;Xinxin Liu

{"title":"Enhanced Swin Transformer and Edge Spatial Attention for Remote Sensing Image Semantic Segmentation","authors":"Fuxiang Liu;Zhiqiang Hu;Lei Li;Hanlu Li;Xinxin Liu","doi":"10.1109/LSP.2025.3550858","DOIUrl":null,"url":null,"abstract":"Combining convolutional neural networks (CNNs) and transformers is a crucial direction in remote sensing image semantic segmentation. However, due to differences in the spatial information focus and feature extraction methods, existing feature transfer and fusion strategies do not effectively integrate the advantages of both approaches. To address these issues, we propose a CNN-transformer hybrid network for precise remote sensing image semantic segmentation. We propose a novel Swin Transformer block to optimize feature extraction and enable the model to handle remote sensing images of arbitrary sizes. Additionally, we design an Edge Spatial Attention module to focus attention on local edge structures, effectively integrating global features and local details. This facilitates efficient information flow between the Transformer encoder and CNN decoder. Finally, a multi-scale convolutional decoder is employed to fully leverage both global information from the Transformer and local features from the CNN, leading to accurate segmentation results. Our network achieved state-of-the-art performance on the Vaihingen and Potsdam datasets, reaching mIoU and F1 scores of 67.37% and 79.82%, as well as 72.39% and 83.68%, respectively.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"1296-1300"},"PeriodicalIF":3.2000,"publicationDate":"2025-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Signal Processing Letters","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10924312/","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

Abstract

Combining convolutional neural networks (CNNs) and transformers is a crucial direction in remote sensing image semantic segmentation. However, due to differences in the spatial information focus and feature extraction methods, existing feature transfer and fusion strategies do not effectively integrate the advantages of both approaches. To address these issues, we propose a CNN-transformer hybrid network for precise remote sensing image semantic segmentation. We propose a novel Swin Transformer block to optimize feature extraction and enable the model to handle remote sensing images of arbitrary sizes. Additionally, we design an Edge Spatial Attention module to focus attention on local edge structures, effectively integrating global features and local details. This facilitates efficient information flow between the Transformer encoder and CNN decoder. Finally, a multi-scale convolutional decoder is employed to fully leverage both global information from the Transformer and local features from the CNN, leading to accurate segmentation results. Our network achieved state-of-the-art performance on the Vaihingen and Potsdam datasets, reaching mIoU and F1 scores of 67.37% and 79.82%, as well as 72.39% and 83.68%, respectively.

查看原文本刊更多论文

基于Swin变压器和边缘空间关注的遥感图像语义分割

将卷积神经网络与变换相结合是遥感图像语义分割的一个重要方向。然而，由于空间信息焦点和特征提取方法的差异，现有的特征转移和融合策略并不能有效地融合两种方法的优势。为了解决这些问题，我们提出了一种用于精确遥感图像语义分割的CNN-transformer混合网络。我们提出了一种新的Swin Transformer模块来优化特征提取，使模型能够处理任意大小的遥感图像。此外，我们设计了边缘空间注意模块，将注意力集中在局部边缘结构上，有效地整合了全局特征和局部细节。这促进了Transformer编码器和CNN解码器之间的有效信息流。最后，采用多尺度卷积解码器，充分利用Transformer的全局信息和CNN的局部特征，得到准确的分割结果。我们的网络在Vaihingen和Potsdam数据集上取得了最先进的性能，mIoU和F1得分分别达到67.37%和79.82%，72.39%和83.68%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Signal Processing Letters 工程技术-工程：电子与电气

CiteScore

7.40

自引率

12.80%

发文量

339

审稿时长

2.8 months

期刊介绍： The IEEE Signal Processing Letters is a monthly, archival publication designed to provide rapid dissemination of original, cutting-edge ideas and timely, significant contributions in signal, image, speech, language and audio processing. Papers published in the Letters can be presented within one year of their appearance in signal processing conferences such as ICASSP, GlobalSIP and ICIP, and also in several workshop organized by the Signal Processing Society.