Shiping Li, Lianhui Liang, Shaoquan Zhang, Ying Zhang, Antonio Plaza, Xuehua Wang
{"title":"End-to-End Convolutional Network and Spectral-Spatial Transformer Architecture for Hyperspectral Image Classification","authors":"Shiping Li, Lianhui Liang, Shaoquan Zhang, Ying Zhang, Antonio Plaza, Xuehua Wang","doi":"10.3390/rs16020325","DOIUrl":null,"url":null,"abstract":"Although convolutional neural networks (CNNs) have proven successful for hyperspectral image classification (HSIC), it is difficult to characterize the global dependencies between HSI pixels at long-distance ranges and spectral bands due to their limited receptive domain. The transformer can compensate well for this shortcoming, but it suffers from a lack of image-specific inductive biases (i.e., localization and translation equivariance) and contextual position information compared with CNNs. To overcome the aforementioned challenges, we introduce a simply structured, end-to-end convolutional network and spectral–spatial transformer (CNSST) architecture for HSIC. Our CNSST architecture consists of two essential components: a simple 3D-CNN-based hierarchical feature fusion network and a spectral–spatial transformer that introduces inductive bias information. The former employs a 3D-CNN-based hierarchical feature fusion structure to establish the correlation between spectral and spatial (SAS) information while capturing richer inductive bias and more discriminative local spectral-spatial hierarchical feature information, while the latter aims to establish the global dependency among HSI pixels while enhancing the acquisition of local information by introducing inductive bias information. Specifically, the spectral and inductive bias information is incorporated into the transformer’s multi-head self-attention mechanism (MHSA), thus making the attention spectrally aware and location-aware. Furthermore, a Lion optimizer is exploited to boost the classification performance of our newly developed CNSST. Substantial experiments conducted on three publicly accessible hyperspectral datasets unequivocally showcase that our proposed CNSST outperforms other state-of-the-art approaches.","PeriodicalId":48993,"journal":{"name":"Remote Sensing","volume":"212 1","pages":""},"PeriodicalIF":4.2000,"publicationDate":"2024-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Remote Sensing","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.3390/rs16020325","RegionNum":2,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENVIRONMENTAL SCIENCES","Score":null,"Total":0}
引用次数: 0
Abstract
Although convolutional neural networks (CNNs) have proven successful for hyperspectral image classification (HSIC), it is difficult to characterize the global dependencies between HSI pixels at long-distance ranges and spectral bands due to their limited receptive domain. The transformer can compensate well for this shortcoming, but it suffers from a lack of image-specific inductive biases (i.e., localization and translation equivariance) and contextual position information compared with CNNs. To overcome the aforementioned challenges, we introduce a simply structured, end-to-end convolutional network and spectral–spatial transformer (CNSST) architecture for HSIC. Our CNSST architecture consists of two essential components: a simple 3D-CNN-based hierarchical feature fusion network and a spectral–spatial transformer that introduces inductive bias information. The former employs a 3D-CNN-based hierarchical feature fusion structure to establish the correlation between spectral and spatial (SAS) information while capturing richer inductive bias and more discriminative local spectral-spatial hierarchical feature information, while the latter aims to establish the global dependency among HSI pixels while enhancing the acquisition of local information by introducing inductive bias information. Specifically, the spectral and inductive bias information is incorporated into the transformer’s multi-head self-attention mechanism (MHSA), thus making the attention spectrally aware and location-aware. Furthermore, a Lion optimizer is exploited to boost the classification performance of our newly developed CNSST. Substantial experiments conducted on three publicly accessible hyperspectral datasets unequivocally showcase that our proposed CNSST outperforms other state-of-the-art approaches.
期刊介绍:
Remote Sensing (ISSN 2072-4292) publishes regular research papers, reviews, letters and communications covering all aspects of the remote sensing process, from instrument design and signal processing to the retrieval of geophysical parameters and their application in geosciences. Our aim is to encourage scientists to publish experimental, theoretical and computational results in as much detail as possible so that results can be easily reproduced. There is no restriction on the length of the papers. The full experimental details must be provided so that the results can be reproduced.