{"title":"基于二次注意卷积和简化变换的高光谱图像分类注意结构","authors":"Xuejiao Liao;Fangyuan Lei;Xun Liu;Li Guo;Alex Hay-Man Ng;Jinchang Ren","doi":"10.1109/LGRS.2025.3583576","DOIUrl":null,"url":null,"abstract":"Convolutional neural network (CNN) and transformer-based hybrid models have been successfully applied to hyperspectral image (HSI) classification, enhancing the local feature extraction capability of single transformer-based models. However, these transformers in the hybrid models suffer from structural redundancy in components such as positional encoding (PE) and multilayer perceptron (MLP). To address the issue, we propose a novel attention architecture termed twice attention convolution module and simplified transformer (TAST) for HSI classification. The proposed TAST primarily consists of a twice attention convolution module (TACM) and a simplified transformer (ST). TACM is designed to improve the ability to extract local features. In addition, we introduce the ST by removing the PE and MLP components from the original transformer, which captures long-range dependencies while simplifying the structure of the original transformer. Experimental results on four public datasets demonstrate that the proposed TAST model outperforms both state-of-the-art CNN and transformer models in terms of classification performance, with improvements in terms of overall accuracy (OA) around 3.87%–34.95% (Indian Pines), 0.35%–23.43% (Salinas), 0.37%–6.05% (WHU-Hi-LongKou), and 0.65%–10.79% (WHU-Hi-HongHu).","PeriodicalId":91017,"journal":{"name":"IEEE geoscience and remote sensing letters : a publication of the IEEE Geoscience and Remote Sensing Society","volume":"22 ","pages":"1-5"},"PeriodicalIF":4.4000,"publicationDate":"2025-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"An Attention Architecture With Twice Attention Convolution and Simplified Transformer for Hyperspectral Image Classification\",\"authors\":\"Xuejiao Liao;Fangyuan Lei;Xun Liu;Li Guo;Alex Hay-Man Ng;Jinchang Ren\",\"doi\":\"10.1109/LGRS.2025.3583576\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Convolutional neural network (CNN) and transformer-based hybrid models have been successfully applied to hyperspectral image (HSI) classification, enhancing the local feature extraction capability of single transformer-based models. However, these transformers in the hybrid models suffer from structural redundancy in components such as positional encoding (PE) and multilayer perceptron (MLP). To address the issue, we propose a novel attention architecture termed twice attention convolution module and simplified transformer (TAST) for HSI classification. The proposed TAST primarily consists of a twice attention convolution module (TACM) and a simplified transformer (ST). TACM is designed to improve the ability to extract local features. In addition, we introduce the ST by removing the PE and MLP components from the original transformer, which captures long-range dependencies while simplifying the structure of the original transformer. Experimental results on four public datasets demonstrate that the proposed TAST model outperforms both state-of-the-art CNN and transformer models in terms of classification performance, with improvements in terms of overall accuracy (OA) around 3.87%–34.95% (Indian Pines), 0.35%–23.43% (Salinas), 0.37%–6.05% (WHU-Hi-LongKou), and 0.65%–10.79% (WHU-Hi-HongHu).\",\"PeriodicalId\":91017,\"journal\":{\"name\":\"IEEE geoscience and remote sensing letters : a publication of the IEEE Geoscience and Remote Sensing Society\",\"volume\":\"22 \",\"pages\":\"1-5\"},\"PeriodicalIF\":4.4000,\"publicationDate\":\"2025-06-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE geoscience and remote sensing letters : a publication of the IEEE Geoscience and Remote Sensing Society\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/11052253/\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE geoscience and remote sensing letters : a publication of the IEEE Geoscience and Remote Sensing Society","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/11052253/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
An Attention Architecture With Twice Attention Convolution and Simplified Transformer for Hyperspectral Image Classification
Convolutional neural network (CNN) and transformer-based hybrid models have been successfully applied to hyperspectral image (HSI) classification, enhancing the local feature extraction capability of single transformer-based models. However, these transformers in the hybrid models suffer from structural redundancy in components such as positional encoding (PE) and multilayer perceptron (MLP). To address the issue, we propose a novel attention architecture termed twice attention convolution module and simplified transformer (TAST) for HSI classification. The proposed TAST primarily consists of a twice attention convolution module (TACM) and a simplified transformer (ST). TACM is designed to improve the ability to extract local features. In addition, we introduce the ST by removing the PE and MLP components from the original transformer, which captures long-range dependencies while simplifying the structure of the original transformer. Experimental results on four public datasets demonstrate that the proposed TAST model outperforms both state-of-the-art CNN and transformer models in terms of classification performance, with improvements in terms of overall accuracy (OA) around 3.87%–34.95% (Indian Pines), 0.35%–23.43% (Salinas), 0.37%–6.05% (WHU-Hi-LongKou), and 0.65%–10.79% (WHU-Hi-HongHu).