End-to-End Convolutional Network and Spectral-Spatial Transformer Architecture for Hyperspectral Image Classification

IF 4.2 2区地球科学 Q2 ENVIRONMENTAL SCIENCES

Remote Sensing Pub Date : 2024-01-12 DOI:10.3390/rs16020325

Shiping Li, Lianhui Liang, Shaoquan Zhang, Ying Zhang, Antonio Plaza, Xuehua Wang

{"title":"End-to-End Convolutional Network and Spectral-Spatial Transformer Architecture for Hyperspectral Image Classification","authors":"Shiping Li, Lianhui Liang, Shaoquan Zhang, Ying Zhang, Antonio Plaza, Xuehua Wang","doi":"10.3390/rs16020325","DOIUrl":null,"url":null,"abstract":"Although convolutional neural networks (CNNs) have proven successful for hyperspectral image classification (HSIC), it is difficult to characterize the global dependencies between HSI pixels at long-distance ranges and spectral bands due to their limited receptive domain. The transformer can compensate well for this shortcoming, but it suffers from a lack of image-specific inductive biases (i.e., localization and translation equivariance) and contextual position information compared with CNNs. To overcome the aforementioned challenges, we introduce a simply structured, end-to-end convolutional network and spectral–spatial transformer (CNSST) architecture for HSIC. Our CNSST architecture consists of two essential components: a simple 3D-CNN-based hierarchical feature fusion network and a spectral–spatial transformer that introduces inductive bias information. The former employs a 3D-CNN-based hierarchical feature fusion structure to establish the correlation between spectral and spatial (SAS) information while capturing richer inductive bias and more discriminative local spectral-spatial hierarchical feature information, while the latter aims to establish the global dependency among HSI pixels while enhancing the acquisition of local information by introducing inductive bias information. Specifically, the spectral and inductive bias information is incorporated into the transformer’s multi-head self-attention mechanism (MHSA), thus making the attention spectrally aware and location-aware. Furthermore, a Lion optimizer is exploited to boost the classification performance of our newly developed CNSST. Substantial experiments conducted on three publicly accessible hyperspectral datasets unequivocally showcase that our proposed CNSST outperforms other state-of-the-art approaches.","PeriodicalId":48993,"journal":{"name":"Remote Sensing","volume":"212 1","pages":""},"PeriodicalIF":4.2000,"publicationDate":"2024-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Remote Sensing","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.3390/rs16020325","RegionNum":2,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENVIRONMENTAL SCIENCES","Score":null,"Total":0}

引用次数: 0

Abstract

Although convolutional neural networks (CNNs) have proven successful for hyperspectral image classification (HSIC), it is difficult to characterize the global dependencies between HSI pixels at long-distance ranges and spectral bands due to their limited receptive domain. The transformer can compensate well for this shortcoming, but it suffers from a lack of image-specific inductive biases (i.e., localization and translation equivariance) and contextual position information compared with CNNs. To overcome the aforementioned challenges, we introduce a simply structured, end-to-end convolutional network and spectral–spatial transformer (CNSST) architecture for HSIC. Our CNSST architecture consists of two essential components: a simple 3D-CNN-based hierarchical feature fusion network and a spectral–spatial transformer that introduces inductive bias information. The former employs a 3D-CNN-based hierarchical feature fusion structure to establish the correlation between spectral and spatial (SAS) information while capturing richer inductive bias and more discriminative local spectral-spatial hierarchical feature information, while the latter aims to establish the global dependency among HSI pixels while enhancing the acquisition of local information by introducing inductive bias information. Specifically, the spectral and inductive bias information is incorporated into the transformer’s multi-head self-attention mechanism (MHSA), thus making the attention spectrally aware and location-aware. Furthermore, a Lion optimizer is exploited to boost the classification performance of our newly developed CNSST. Substantial experiments conducted on three publicly accessible hyperspectral datasets unequivocally showcase that our proposed CNSST outperforms other state-of-the-art approaches.

查看原文本刊更多论文

用于高光谱图像分类的端到端卷积网络和光谱空间变换器架构

虽然卷积神经网络（CNN）在高光谱图像分类（HSIC）方面已被证明是成功的，但由于其接受域有限，很难描述远距离和光谱带的高光谱图像像素之间的全局依赖关系。变换器可以很好地弥补这一缺陷，但与 CNN 相比，它缺乏特定图像的归纳偏差（即定位和平移等差）和上下文位置信息。为了克服上述挑战，我们为 HSIC 引入了一种结构简单、端到端的卷积网络和频谱空间变换器（CNSST）架构。我们的 CNSST 架构由两个重要部分组成：一个是基于简单 3D-CNN 的分层特征融合网络，另一个是引入感应偏置信息的光谱空间变换器。前者采用基于 3D-CNN 的分层特征融合结构来建立光谱和空间（SAS）信息之间的相关性，同时捕捉更丰富的感应偏差和更具区分性的局部光谱-空间分层特征信息；后者旨在建立 HSI 像素之间的全局依赖关系，同时通过引入感应偏差信息来增强局部信息的获取。具体来说，光谱和感应偏差信息被纳入变压器的多头自我注意机制（MHSA），从而使注意具有光谱感知和位置感知能力。此外，我们还利用 Lion 优化器提高了新开发的 CNSST 的分类性能。在三个公开的高光谱数据集上进行的大量实验明确显示，我们提出的 CNSST 优于其他最先进的方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Remote Sensing REMOTE SENSING-

CiteScore

8.30

自引率

24.00%

发文量

5435

审稿时长

20.66 days

期刊介绍： Remote Sensing (ISSN 2072-4292) publishes regular research papers, reviews, letters and communications covering all aspects of the remote sensing process, from instrument design and signal processing to the retrieval of geophysical parameters and their application in geosciences. Our aim is to encourage scientists to publish experimental, theoretical and computational results in as much detail as possible so that results can be easily reproduced. There is no restriction on the length of the papers. The full experimental details must be provided so that the results can be reproduced.