{"title":"An Attention Architecture With Twice Attention Convolution and Simplified Transformer for Hyperspectral Image Classification","authors":"Xuejiao Liao;Fangyuan Lei;Xun Liu;Li Guo;Alex Hay-Man Ng;Jinchang Ren","doi":"10.1109/LGRS.2025.3583576","DOIUrl":null,"url":null,"abstract":"Convolutional neural network (CNN) and transformer-based hybrid models have been successfully applied to hyperspectral image (HSI) classification, enhancing the local feature extraction capability of single transformer-based models. However, these transformers in the hybrid models suffer from structural redundancy in components such as positional encoding (PE) and multilayer perceptron (MLP). To address the issue, we propose a novel attention architecture termed twice attention convolution module and simplified transformer (TAST) for HSI classification. The proposed TAST primarily consists of a twice attention convolution module (TACM) and a simplified transformer (ST). TACM is designed to improve the ability to extract local features. In addition, we introduce the ST by removing the PE and MLP components from the original transformer, which captures long-range dependencies while simplifying the structure of the original transformer. Experimental results on four public datasets demonstrate that the proposed TAST model outperforms both state-of-the-art CNN and transformer models in terms of classification performance, with improvements in terms of overall accuracy (OA) around 3.87%–34.95% (Indian Pines), 0.35%–23.43% (Salinas), 0.37%–6.05% (WHU-Hi-LongKou), and 0.65%–10.79% (WHU-Hi-HongHu).","PeriodicalId":91017,"journal":{"name":"IEEE geoscience and remote sensing letters : a publication of the IEEE Geoscience and Remote Sensing Society","volume":"22 ","pages":"1-5"},"PeriodicalIF":0.0000,"publicationDate":"2025-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE geoscience and remote sensing letters : a publication of the IEEE Geoscience and Remote Sensing Society","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/11052253/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Convolutional neural network (CNN) and transformer-based hybrid models have been successfully applied to hyperspectral image (HSI) classification, enhancing the local feature extraction capability of single transformer-based models. However, these transformers in the hybrid models suffer from structural redundancy in components such as positional encoding (PE) and multilayer perceptron (MLP). To address the issue, we propose a novel attention architecture termed twice attention convolution module and simplified transformer (TAST) for HSI classification. The proposed TAST primarily consists of a twice attention convolution module (TACM) and a simplified transformer (ST). TACM is designed to improve the ability to extract local features. In addition, we introduce the ST by removing the PE and MLP components from the original transformer, which captures long-range dependencies while simplifying the structure of the original transformer. Experimental results on four public datasets demonstrate that the proposed TAST model outperforms both state-of-the-art CNN and transformer models in terms of classification performance, with improvements in terms of overall accuracy (OA) around 3.87%–34.95% (Indian Pines), 0.35%–23.43% (Salinas), 0.37%–6.05% (WHU-Hi-LongKou), and 0.65%–10.79% (WHU-Hi-HongHu).