ViT-UNet: A Vision Transformer Based UNet Model for Coastal Wetland Classification Based on High Spatial Resolution Imagery

IF 4.7 2区地球科学 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing Pub Date : 2024-10-28 DOI:10.1109/JSTARS.2024.3487250

Nan Zhou;Mingming Xu;Biaoqun Shen;Ke Hou;Shanwei Liu;Hui Sheng;Yanfen Liu;Jianhua Wan

{"title":"ViT-UNet: A Vision Transformer Based UNet Model for Coastal Wetland Classification Based on High Spatial Resolution Imagery","authors":"Nan Zhou;Mingming Xu;Biaoqun Shen;Ke Hou;Shanwei Liu;Hui Sheng;Yanfen Liu;Jianhua Wan","doi":"10.1109/JSTARS.2024.3487250","DOIUrl":null,"url":null,"abstract":"High resolution remote sensing imagery plays a crucial role in monitoring coastal wetlands. Coastal wetland landscapes exhibit diverse features, ranging from fragmented patches to expansive areas. Mainstream convolutional neural networks cannot effectively analyze spatial relationships among consecutive image elements. This limitation impedes their performance in accurately classifying coastal wetlands. In order to tackle the above issues, we propose a Vision Transformer based UNet (ViT-UNet) model. This model extracts wetland features from high resolution remote sensing images by sensing and optimizing multiscale features. To establish global dependencies, the Vision Transformer (ViT) is introduced to replace the convolutional layer in the UNet encoder. Simultaneously, the model incorporates a convolutional block attention module and a multiple hierarchies attention module to restore attentional features and reduce feature loss. In addition, a skip connection is added to the single-skip structure of the original UNet model. This connection simultaneously links the output of the entire transformer and internal attention features to the corresponding decoder level. This enhancement aims to furnish the decoder with comprehensive global information guidance. Finally, all the extracted feature information is fused using Bilinear Polymerization Pooling (BPP). The BPP assists the network in obtaining a more comprehensive and detailed feature representation. Experimental results on the Gaofen-1 dataset demonstrate that the proposed ViT-UNet method achieves a Precision score of 93.50\n<inline-formula><tex-math>$\\%$</tex-math></inline-formula>\n, outperforming the original UNet model by 4.10\n<inline-formula><tex-math>$\\%$</tex-math></inline-formula>\n. Compared with other state-of-the-art networks, ViT-UNet performs more accurately and finer in the extraction of wetland information in the Yellow River Delta.","PeriodicalId":13116,"journal":{"name":"IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing","volume":"17 ","pages":"19575-19587"},"PeriodicalIF":4.7000,"publicationDate":"2024-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10737119","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10737119/","RegionNum":2,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

Abstract

High resolution remote sensing imagery plays a crucial role in monitoring coastal wetlands. Coastal wetland landscapes exhibit diverse features, ranging from fragmented patches to expansive areas. Mainstream convolutional neural networks cannot effectively analyze spatial relationships among consecutive image elements. This limitation impedes their performance in accurately classifying coastal wetlands. In order to tackle the above issues, we propose a Vision Transformer based UNet (ViT-UNet) model. This model extracts wetland features from high resolution remote sensing images by sensing and optimizing multiscale features. To establish global dependencies, the Vision Transformer (ViT) is introduced to replace the convolutional layer in the UNet encoder. Simultaneously, the model incorporates a convolutional block attention module and a multiple hierarchies attention module to restore attentional features and reduce feature loss. In addition, a skip connection is added to the single-skip structure of the original UNet model. This connection simultaneously links the output of the entire transformer and internal attention features to the corresponding decoder level. This enhancement aims to furnish the decoder with comprehensive global information guidance. Finally, all the extracted feature information is fused using Bilinear Polymerization Pooling (BPP). The BPP assists the network in obtaining a more comprehensive and detailed feature representation. Experimental results on the Gaofen-1 dataset demonstrate that the proposed ViT-UNet method achieves a Precision score of 93.50

$\%$

, outperforming the original UNet model by 4.10

$\%$

. Compared with other state-of-the-art networks, ViT-UNet performs more accurately and finer in the extraction of wetland information in the Yellow River Delta.

查看原文本刊更多论文

ViT-UNet：基于视觉转换器的 UNet 模型，用于基于高空间分辨率图像的沿海湿地分类

高分辨率遥感图像在监测沿岸湿地方面起着至关重要的作用。沿海湿地景观呈现出多种多样的特征，从支离破碎的斑块到广阔无垠的区域，不一而足。主流卷积神经网络无法有效分析连续图像元素之间的空间关系。这一局限性阻碍了它们在准确划分滨海湿地方面的性能。为了解决上述问题，我们提出了基于视觉变换器的 UNet（ViT-UNet）模型。该模型通过感知和优化多尺度特征，从高分辨率遥感图像中提取湿地特征。为了建立全局依赖关系，引入了视觉变换器（ViT）来替代 UNet 编码器中的卷积层。同时，该模型还加入了卷积块注意模块和多层次注意模块，以恢复注意特征并减少特征损失。此外，在原始 UNet 模型的单跳转结构中加入了跳转连接。该连接同时将整个变压器和内部注意特征的输出连接到相应的解码器层。这一改进旨在为解码器提供全面的全局信息指导。最后，使用双线性聚合池化（BPP）融合所有提取的特征信息。BPP 可帮助网络获得更全面、更详细的特征表示。在高分一号数据集上的实验结果表明，所提出的 ViT-UNet 方法的精确度达到了 93.50%，比原始 UNet 模型高出 4.10%。与其他最先进的网络相比，ViT-UNet 在提取黄河三角洲湿地信息方面的表现更准确、更精细。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 地学-成像科学与照相技术

CiteScore

9.30

自引率

10.90%

发文量

563

审稿时长

4.7 months

期刊介绍： The IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing addresses the growing field of applications in Earth observations and remote sensing, and also provides a venue for the rapidly expanding special issues that are being sponsored by the IEEE Geosciences and Remote Sensing Society. The journal draws upon the experience of the highly successful “IEEE Transactions on Geoscience and Remote Sensing” and provide a complementary medium for the wide range of topics in applied earth observations. The ‘Applications’ areas encompasses the societal benefit areas of the Global Earth Observations Systems of Systems (GEOSS) program. Through deliberations over two years, ministers from 50 countries agreed to identify nine areas where Earth observation could positively impact the quality of life and health of their respective countries. Some of these are areas not traditionally addressed in the IEEE context. These include biodiversity, health and climate. Yet it is the skill sets of IEEE members, in areas such as observations, communications, computers, signal processing, standards and ocean engineering, that form the technical underpinnings of GEOSS. Thus, the Journal attracts a broad range of interests that serves both present members in new ways and expands the IEEE visibility into new areas.