利用 Vision Mamba 在遥感图像语义分割中重新思考扫描策略：实验研究

IF 5.3 2区地球科学 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing Pub Date : 2024-10-02 DOI:10.1109/JSTARS.2024.3472296

Qinfeng Zhu;Yuan Fang;Yuanzhi Cai;Cheng Chen;Lei Fan

{"title":"利用 Vision Mamba 在遥感图像语义分割中重新思考扫描策略：实验研究","authors":"Qinfeng Zhu;Yuan Fang;Yuanzhi Cai;Cheng Chen;Lei Fan","doi":"10.1109/JSTARS.2024.3472296","DOIUrl":null,"url":null,"abstract":"Deep learning methods, especially convolutional neural networks (CNNs) and vision transformers (ViTs), are frequently employed to perform semantic segmentation of high-resolution remotely sensed images. However, CNNs are constrained by their restricted receptive fields, while ViTs face challenges due to their quadratic complexity. Recently, the Mamba model, featuring linear complexity and a global receptive field, has gained extensive attention for vision tasks. In such tasks, images need to be serialized to form sequences compatible with the Mamba model. Numerous research efforts have explored scanning strategies to serialize images, aiming to enhance the Mamba model's understanding of images. However, the effectiveness of these scanning strategies remains uncertain. In this research, we conduct a comprehensive experimental investigation on the impact of mainstream scanning directions and their combinations on semantic segmentation of remotely sensed images. Through extensive experiments on the LoveDA, ISPRS Potsdam, ISPRS Vaihingen, and UAVid datasets, we demonstrate that no single scanning strategy outperforms others, regardless of their complexity or the number of scanning directions involved. A simple, single scanning direction is deemed sufficient for semantic segmentation of high-resolution remotely sensed images. Relevant directions for future research are also recommended.","PeriodicalId":13116,"journal":{"name":"IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing","volume":"17 ","pages":"18223-18234"},"PeriodicalIF":5.3000,"publicationDate":"2024-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10703181","citationCount":"0","resultStr":"{\"title\":\"Rethinking Scanning Strategies With Vision Mamba in Semantic Segmentation of Remote Sensing Imagery: An Experimental Study\",\"authors\":\"Qinfeng Zhu;Yuan Fang;Yuanzhi Cai;Cheng Chen;Lei Fan\",\"doi\":\"10.1109/JSTARS.2024.3472296\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Deep learning methods, especially convolutional neural networks (CNNs) and vision transformers (ViTs), are frequently employed to perform semantic segmentation of high-resolution remotely sensed images. However, CNNs are constrained by their restricted receptive fields, while ViTs face challenges due to their quadratic complexity. Recently, the Mamba model, featuring linear complexity and a global receptive field, has gained extensive attention for vision tasks. In such tasks, images need to be serialized to form sequences compatible with the Mamba model. Numerous research efforts have explored scanning strategies to serialize images, aiming to enhance the Mamba model's understanding of images. However, the effectiveness of these scanning strategies remains uncertain. In this research, we conduct a comprehensive experimental investigation on the impact of mainstream scanning directions and their combinations on semantic segmentation of remotely sensed images. Through extensive experiments on the LoveDA, ISPRS Potsdam, ISPRS Vaihingen, and UAVid datasets, we demonstrate that no single scanning strategy outperforms others, regardless of their complexity or the number of scanning directions involved. A simple, single scanning direction is deemed sufficient for semantic segmentation of high-resolution remotely sensed images. Relevant directions for future research are also recommended.\",\"PeriodicalId\":13116,\"journal\":{\"name\":\"IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing\",\"volume\":\"17 \",\"pages\":\"18223-18234\"},\"PeriodicalIF\":5.3000,\"publicationDate\":\"2024-10-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10703181\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10703181/\",\"RegionNum\":2,\"RegionCategory\":\"地球科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10703181/","RegionNum":2,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

摘要

深度学习方法，尤其是卷积神经网络（CNN）和视觉变换器（ViT），经常被用来对高分辨率遥感图像进行语义分割。然而，卷积神经网络受限于其有限的感受野，而视觉变换器则因其二次复杂性而面临挑战。最近，具有线性复杂性和全局感受野的 Mamba 模型在视觉任务中获得了广泛关注。在这类任务中，需要对图像进行序列化，以形成与曼巴模型兼容的序列。许多研究工作都在探索图像序列化的扫描策略，旨在增强曼巴模型对图像的理解。然而，这些扫描策略的有效性仍不确定。在本研究中，我们就主流扫描方向及其组合对遥感图像语义分割的影响进行了全面的实验研究。通过在 LoveDA、ISPRS Potsdam、ISPRS Vaihingen 和 UAVid 数据集上进行大量实验，我们证明了没有一种扫描策略优于其他策略，无论其复杂程度或涉及的扫描方向数量如何。对于高分辨率遥感图像的语义分割来说，简单的单一扫描方向就足够了。此外，还推荐了未来研究的相关方向。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Rethinking Scanning Strategies With Vision Mamba in Semantic Segmentation of Remote Sensing Imagery: An Experimental Study

Deep learning methods, especially convolutional neural networks (CNNs) and vision transformers (ViTs), are frequently employed to perform semantic segmentation of high-resolution remotely sensed images. However, CNNs are constrained by their restricted receptive fields, while ViTs face challenges due to their quadratic complexity. Recently, the Mamba model, featuring linear complexity and a global receptive field, has gained extensive attention for vision tasks. In such tasks, images need to be serialized to form sequences compatible with the Mamba model. Numerous research efforts have explored scanning strategies to serialize images, aiming to enhance the Mamba model's understanding of images. However, the effectiveness of these scanning strategies remains uncertain. In this research, we conduct a comprehensive experimental investigation on the impact of mainstream scanning directions and their combinations on semantic segmentation of remotely sensed images. Through extensive experiments on the LoveDA, ISPRS Potsdam, ISPRS Vaihingen, and UAVid datasets, we demonstrate that no single scanning strategy outperforms others, regardless of their complexity or the number of scanning directions involved. A simple, single scanning direction is deemed sufficient for semantic segmentation of high-resolution remotely sensed images. Relevant directions for future research are also recommended.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 地学-成像科学与照相技术

CiteScore

9.30

自引率

10.90%

发文量

563

审稿时长

4.7 months

期刊介绍： The IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing addresses the growing field of applications in Earth observations and remote sensing, and also provides a venue for the rapidly expanding special issues that are being sponsored by the IEEE Geosciences and Remote Sensing Society. The journal draws upon the experience of the highly successful “IEEE Transactions on Geoscience and Remote Sensing” and provide a complementary medium for the wide range of topics in applied earth observations. The ‘Applications’ areas encompasses the societal benefit areas of the Global Earth Observations Systems of Systems (GEOSS) program. Through deliberations over two years, ministers from 50 countries agreed to identify nine areas where Earth observation could positively impact the quality of life and health of their respective countries. Some of these are areas not traditionally addressed in the IEEE context. These include biodiversity, health and climate. Yet it is the skill sets of IEEE members, in areas such as observations, communications, computers, signal processing, standards and ocean engineering, that form the technical underpinnings of GEOSS. Thus, the Journal attracts a broad range of interests that serves both present members in new ways and expands the IEEE visibility into new areas.