Cascaded CNN and global–local attention transformer network-based semantic segmentation for high-resolution remote sensing image

IF 1.4 4区地球科学 Q4 ENVIRONMENTAL SCIENCES

Journal of Applied Remote Sensing Pub Date : 2024-07-01 DOI:10.1117/1.jrs.18.034502

Xiaohui Liu, Lei Zhang, Rui Wang, Xiaoyu Li, Jiyang Xu, Xiaochen Lu

{"title":"Cascaded CNN and global–local attention transformer network-based semantic segmentation for high-resolution remote sensing image","authors":"Xiaohui Liu, Lei Zhang, Rui Wang, Xiaoyu Li, Jiyang Xu, Xiaochen Lu","doi":"10.1117/1.jrs.18.034502","DOIUrl":null,"url":null,"abstract":"High-resolution remote sensing images (HRRSIs) contain rich local spatial information and long-distance location dependence, which play an important role in semantic segmentation tasks and have received more and more research attention. However, HRRSIs often exhibit large intraclass variance and small interclass variance due to the diversity and complexity of ground objects, thereby bringing great challenges to a semantic segmentation task. In most networks, there are numerous small-scale object omissions and large-scale object fragmentations in the segmentation results because of insufficient local feature extraction and low global information utilization. A network cascaded by convolution neural network and global–local attention transformer is proposed called CNN-transformer cascade network. First, convolution blocks and global–local attention transformer blocks are used to extract multiscale local features and long-range location information, respectively. Then a multilevel channel attention integration block is designed to fuse geometric features and semantic features of different depths and revise the channel weights through the channel attention module to resist the interference of redundant information. Finally, the smoothness of the segmentation is improved through the implementation of upsampling using a deconvolution operation. We compare our method with several state-of-the-art methods on the ISPRS Vaihingen and Potsdam datasets. Experimental results show that our method can improve the integrity and independence of multiscale objects segmentation results.","PeriodicalId":54879,"journal":{"name":"Journal of Applied Remote Sensing","volume":"20 1","pages":""},"PeriodicalIF":1.4000,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Applied Remote Sensing","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.1117/1.jrs.18.034502","RegionNum":4,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"ENVIRONMENTAL SCIENCES","Score":null,"Total":0}

引用次数: 0

Abstract

High-resolution remote sensing images (HRRSIs) contain rich local spatial information and long-distance location dependence, which play an important role in semantic segmentation tasks and have received more and more research attention. However, HRRSIs often exhibit large intraclass variance and small interclass variance due to the diversity and complexity of ground objects, thereby bringing great challenges to a semantic segmentation task. In most networks, there are numerous small-scale object omissions and large-scale object fragmentations in the segmentation results because of insufficient local feature extraction and low global information utilization. A network cascaded by convolution neural network and global–local attention transformer is proposed called CNN-transformer cascade network. First, convolution blocks and global–local attention transformer blocks are used to extract multiscale local features and long-range location information, respectively. Then a multilevel channel attention integration block is designed to fuse geometric features and semantic features of different depths and revise the channel weights through the channel attention module to resist the interference of redundant information. Finally, the smoothness of the segmentation is improved through the implementation of upsampling using a deconvolution operation. We compare our method with several state-of-the-art methods on the ISPRS Vaihingen and Potsdam datasets. Experimental results show that our method can improve the integrity and independence of multiscale objects segmentation results.

查看原文本刊更多论文

基于级联 CNN 和全局-局部注意力变换器网络的高分辨率遥感图像语义分割技术

高分辨率遥感图像（HRRSIs）包含丰富的局部空间信息和远距离位置依赖性，在语义分割任务中发挥着重要作用，受到越来越多的研究关注。然而，由于地面物体的多样性和复杂性，HRRSI 通常表现出较大的类内方差和较小的类间方差，从而给语义分割任务带来巨大挑战。在大多数网络中，由于局部特征提取不足和全局信息利用率低，分割结果中会出现大量小范围的物体遗漏和大范围的物体破碎。我们提出了一种由卷积神经网络和全局-局部注意力转换器级联的网络，称为 CNN-转换器级联网络。首先，卷积块和全局-局部注意力变换器块分别用于提取多尺度局部特征和远距离位置信息。然后，设计一个多级通道注意集成块，以融合不同深度的几何特征和语义特征，并通过通道注意模块修正通道权重，以抵御冗余信息的干扰。最后，通过使用解卷积操作进行上采样，提高了分割的平滑度。我们在 ISPRS Vaihingen 和 Potsdam 数据集上比较了我们的方法和几种最先进的方法。实验结果表明，我们的方法可以提高多尺度物体分割结果的完整性和独立性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Applied Remote Sensing 环境科学-成像科学与照相技术

CiteScore

3.40

自引率

11.80%

发文量

194

审稿时长

3 months

期刊介绍： The Journal of Applied Remote Sensing is a peer-reviewed journal that optimizes the communication of concepts, information, and progress among the remote sensing community.