ViT-UNet: A Vision Transformer Based UNet Model for Coastal Wetland Classification Based on High Spatial Resolution Imagery

IF 4.7 2区 地球科学 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC
Nan Zhou;Mingming Xu;Biaoqun Shen;Ke Hou;Shanwei Liu;Hui Sheng;Yanfen Liu;Jianhua Wan
{"title":"ViT-UNet: A Vision Transformer Based UNet Model for Coastal Wetland Classification Based on High Spatial Resolution Imagery","authors":"Nan Zhou;Mingming Xu;Biaoqun Shen;Ke Hou;Shanwei Liu;Hui Sheng;Yanfen Liu;Jianhua Wan","doi":"10.1109/JSTARS.2024.3487250","DOIUrl":null,"url":null,"abstract":"High resolution remote sensing imagery plays a crucial role in monitoring coastal wetlands. Coastal wetland landscapes exhibit diverse features, ranging from fragmented patches to expansive areas. Mainstream convolutional neural networks cannot effectively analyze spatial relationships among consecutive image elements. This limitation impedes their performance in accurately classifying coastal wetlands. In order to tackle the above issues, we propose a Vision Transformer based UNet (ViT-UNet) model. This model extracts wetland features from high resolution remote sensing images by sensing and optimizing multiscale features. To establish global dependencies, the Vision Transformer (ViT) is introduced to replace the convolutional layer in the UNet encoder. Simultaneously, the model incorporates a convolutional block attention module and a multiple hierarchies attention module to restore attentional features and reduce feature loss. In addition, a skip connection is added to the single-skip structure of the original UNet model. This connection simultaneously links the output of the entire transformer and internal attention features to the corresponding decoder level. This enhancement aims to furnish the decoder with comprehensive global information guidance. Finally, all the extracted feature information is fused using Bilinear Polymerization Pooling (BPP). The BPP assists the network in obtaining a more comprehensive and detailed feature representation. Experimental results on the Gaofen-1 dataset demonstrate that the proposed ViT-UNet method achieves a Precision score of 93.50\n<inline-formula><tex-math>$\\%$</tex-math></inline-formula>\n, outperforming the original UNet model by 4.10\n<inline-formula><tex-math>$\\%$</tex-math></inline-formula>\n. Compared with other state-of-the-art networks, ViT-UNet performs more accurately and finer in the extraction of wetland information in the Yellow River Delta.","PeriodicalId":13116,"journal":{"name":"IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing","volume":"17 ","pages":"19575-19587"},"PeriodicalIF":4.7000,"publicationDate":"2024-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10737119","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10737119/","RegionNum":2,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0

Abstract

High resolution remote sensing imagery plays a crucial role in monitoring coastal wetlands. Coastal wetland landscapes exhibit diverse features, ranging from fragmented patches to expansive areas. Mainstream convolutional neural networks cannot effectively analyze spatial relationships among consecutive image elements. This limitation impedes their performance in accurately classifying coastal wetlands. In order to tackle the above issues, we propose a Vision Transformer based UNet (ViT-UNet) model. This model extracts wetland features from high resolution remote sensing images by sensing and optimizing multiscale features. To establish global dependencies, the Vision Transformer (ViT) is introduced to replace the convolutional layer in the UNet encoder. Simultaneously, the model incorporates a convolutional block attention module and a multiple hierarchies attention module to restore attentional features and reduce feature loss. In addition, a skip connection is added to the single-skip structure of the original UNet model. This connection simultaneously links the output of the entire transformer and internal attention features to the corresponding decoder level. This enhancement aims to furnish the decoder with comprehensive global information guidance. Finally, all the extracted feature information is fused using Bilinear Polymerization Pooling (BPP). The BPP assists the network in obtaining a more comprehensive and detailed feature representation. Experimental results on the Gaofen-1 dataset demonstrate that the proposed ViT-UNet method achieves a Precision score of 93.50 $\%$ , outperforming the original UNet model by 4.10 $\%$ . Compared with other state-of-the-art networks, ViT-UNet performs more accurately and finer in the extraction of wetland information in the Yellow River Delta.
ViT-UNet:基于视觉转换器的 UNet 模型,用于基于高空间分辨率图像的沿海湿地分类
高分辨率遥感图像在监测沿岸湿地方面起着至关重要的作用。沿海湿地景观呈现出多种多样的特征,从支离破碎的斑块到广阔无垠的区域,不一而足。主流卷积神经网络无法有效分析连续图像元素之间的空间关系。这一局限性阻碍了它们在准确划分滨海湿地方面的性能。为了解决上述问题,我们提出了基于视觉变换器的 UNet(ViT-UNet)模型。该模型通过感知和优化多尺度特征,从高分辨率遥感图像中提取湿地特征。为了建立全局依赖关系,引入了视觉变换器(ViT)来替代 UNet 编码器中的卷积层。同时,该模型还加入了卷积块注意模块和多层次注意模块,以恢复注意特征并减少特征损失。此外,在原始 UNet 模型的单跳转结构中加入了跳转连接。该连接同时将整个变压器和内部注意特征的输出连接到相应的解码器层。这一改进旨在为解码器提供全面的全局信息指导。最后,使用双线性聚合池化(BPP)融合所有提取的特征信息。BPP 可帮助网络获得更全面、更详细的特征表示。在高分一号数据集上的实验结果表明,所提出的 ViT-UNet 方法的精确度达到了 93.50%,比原始 UNet 模型高出 4.10%。与其他最先进的网络相比,ViT-UNet 在提取黄河三角洲湿地信息方面的表现更准确、更精细。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
9.30
自引率
10.90%
发文量
563
审稿时长
4.7 months
期刊介绍: The IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing addresses the growing field of applications in Earth observations and remote sensing, and also provides a venue for the rapidly expanding special issues that are being sponsored by the IEEE Geosciences and Remote Sensing Society. The journal draws upon the experience of the highly successful “IEEE Transactions on Geoscience and Remote Sensing” and provide a complementary medium for the wide range of topics in applied earth observations. The ‘Applications’ areas encompasses the societal benefit areas of the Global Earth Observations Systems of Systems (GEOSS) program. Through deliberations over two years, ministers from 50 countries agreed to identify nine areas where Earth observation could positively impact the quality of life and health of their respective countries. Some of these are areas not traditionally addressed in the IEEE context. These include biodiversity, health and climate. Yet it is the skill sets of IEEE members, in areas such as observations, communications, computers, signal processing, standards and ocean engineering, that form the technical underpinnings of GEOSS. Thus, the Journal attracts a broad range of interests that serves both present members in new ways and expands the IEEE visibility into new areas.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信