AMMUNet: Multiscale Attention Map Merging for Remote Sensing Image Segmentation

IEEE geoscience and remote sensing letters : a publication of the IEEE Geoscience and Remote Sensing Society Pub Date : 2024-11-27 DOI:10.1109/LGRS.2024.3506718

Yang Yang;Shunyi Zheng;Xiqi Wang;Wei Ao;Zhao Liu

{"title":"AMMUNet: Multiscale Attention Map Merging for Remote Sensing Image Segmentation","authors":"Yang Yang;Shunyi Zheng;Xiqi Wang;Wei Ao;Zhao Liu","doi":"10.1109/LGRS.2024.3506718","DOIUrl":null,"url":null,"abstract":"The advancement of deep learning has driven notable progress in remote sensing semantic segmentation. Multihead self-attention (MSA) mechanisms have been widely adopted in semantic segmentation tasks. Network architectures exemplified by Vision Transformers have implemented window-based operations in the spatial domain to reduce computational costs. However, this approach comes at the expense of a weakened capacity to capture long-range dependencies, potentially limiting their efficacy in remote sensing image processing. In this letter, we propose AMMUNet, a UNet-based framework that employs multiscale attention map (AM) merging, comprising two key innovations: the attention map merging mechanism (AMMM) module and the granular multihead self-attention (GMSA). AMMM effectively combines multiscale AMs into a unified representation using a fixed mask template, enabling the modeling of a global attention mechanism. By integrating precomputed AMs in preceding layers, AMMM reduces computational costs while preserving global correlations. The proposed GMSA efficiently acquires global information while substantially mitigating computational costs in contrast to the global MSA mechanism. This is accomplished through the strategic alignment of granularity and the reduction of relative position bias parameters, thereby optimizing computational efficiency. Experimental evaluations highlight the superior performance of our approach, achieving remarkable mean intersection over union (mIoU) scores of 75.48% on the challenging Vaihingen dataset and an exceptional 77.90% on the Potsdam dataset, demonstrating the superiority of our method in precise remote sensing semantic segmentation. Codes are available at \n<uri>https://github.com/interpretty/AMMUNet</uri>\n.","PeriodicalId":91017,"journal":{"name":"IEEE geoscience and remote sensing letters : a publication of the IEEE Geoscience and Remote Sensing Society","volume":"22 ","pages":"1-5"},"PeriodicalIF":0.0000,"publicationDate":"2024-11-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE geoscience and remote sensing letters : a publication of the IEEE Geoscience and Remote Sensing Society","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10767738/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

The advancement of deep learning has driven notable progress in remote sensing semantic segmentation. Multihead self-attention (MSA) mechanisms have been widely adopted in semantic segmentation tasks. Network architectures exemplified by Vision Transformers have implemented window-based operations in the spatial domain to reduce computational costs. However, this approach comes at the expense of a weakened capacity to capture long-range dependencies, potentially limiting their efficacy in remote sensing image processing. In this letter, we propose AMMUNet, a UNet-based framework that employs multiscale attention map (AM) merging, comprising two key innovations: the attention map merging mechanism (AMMM) module and the granular multihead self-attention (GMSA). AMMM effectively combines multiscale AMs into a unified representation using a fixed mask template, enabling the modeling of a global attention mechanism. By integrating precomputed AMs in preceding layers, AMMM reduces computational costs while preserving global correlations. The proposed GMSA efficiently acquires global information while substantially mitigating computational costs in contrast to the global MSA mechanism. This is accomplished through the strategic alignment of granularity and the reduction of relative position bias parameters, thereby optimizing computational efficiency. Experimental evaluations highlight the superior performance of our approach, achieving remarkable mean intersection over union (mIoU) scores of 75.48% on the challenging Vaihingen dataset and an exceptional 77.90% on the Potsdam dataset, demonstrating the superiority of our method in precise remote sensing semantic segmentation. Codes are available at https://github.com/interpretty/AMMUNet .

查看原文本刊更多论文

AMMUNet：用于遥感图像分割的多尺度注意图合并

深度学习的发展推动了遥感语义分割的显著进展。多头自注意（MSA）机制在语义分割任务中被广泛采用。以Vision transformer为例的网络架构在空间域中实现了基于窗口的操作，以减少计算成本。然而，这种方法的代价是削弱了捕获远程依赖关系的能力，可能限制了它们在遥感图像处理中的功效。在这封信中，我们提出了AMMUNet，一个基于unet的框架，采用多尺度注意图（AM）合并，包括两个关键创新：注意图合并机制（AMMM）模块和颗粒头自注意（GMSA）。AMMM使用固定的掩模模板有效地将多尺度am组合成统一的表示，实现了全局注意机制的建模。通过在前面的层中集成预先计算的am， AMMM降低了计算成本，同时保持了全局相关性。与全局MSA机制相比，所提出的GMSA有效地获取全局信息，同时大大降低了计算成本。这是通过粒度的战略性对齐和相对位置偏差参数的减少来实现的，从而优化了计算效率。实验评估突出了我们的方法的优越性能，在具有挑战性的Vaihingen数据集上实现了75.48%的平均交联（mIoU）分数，在波茨坦数据集上实现了77.90%的平均交联分数，证明了我们的方法在精确遥感语义分割方面的优势。代码可在https://github.com/interpretty/AMMUNet上获得。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE geoscience and remote sensing letters : a publication of the IEEE Geoscience and Remote Sensing Society

自引率

0.00%

发文量