An Attention Architecture With Twice Attention Convolution and Simplified Transformer for Hyperspectral Image Classification

IEEE geoscience and remote sensing letters : a publication of the IEEE Geoscience and Remote Sensing Society Pub Date : 2025-06-26 DOI:10.1109/LGRS.2025.3583576

Xuejiao Liao;Fangyuan Lei;Xun Liu;Li Guo;Alex Hay-Man Ng;Jinchang Ren

{"title":"An Attention Architecture With Twice Attention Convolution and Simplified Transformer for Hyperspectral Image Classification","authors":"Xuejiao Liao;Fangyuan Lei;Xun Liu;Li Guo;Alex Hay-Man Ng;Jinchang Ren","doi":"10.1109/LGRS.2025.3583576","DOIUrl":null,"url":null,"abstract":"Convolutional neural network (CNN) and transformer-based hybrid models have been successfully applied to hyperspectral image (HSI) classification, enhancing the local feature extraction capability of single transformer-based models. However, these transformers in the hybrid models suffer from structural redundancy in components such as positional encoding (PE) and multilayer perceptron (MLP). To address the issue, we propose a novel attention architecture termed twice attention convolution module and simplified transformer (TAST) for HSI classification. The proposed TAST primarily consists of a twice attention convolution module (TACM) and a simplified transformer (ST). TACM is designed to improve the ability to extract local features. In addition, we introduce the ST by removing the PE and MLP components from the original transformer, which captures long-range dependencies while simplifying the structure of the original transformer. Experimental results on four public datasets demonstrate that the proposed TAST model outperforms both state-of-the-art CNN and transformer models in terms of classification performance, with improvements in terms of overall accuracy (OA) around 3.87%–34.95% (Indian Pines), 0.35%–23.43% (Salinas), 0.37%–6.05% (WHU-Hi-LongKou), and 0.65%–10.79% (WHU-Hi-HongHu).","PeriodicalId":91017,"journal":{"name":"IEEE geoscience and remote sensing letters : a publication of the IEEE Geoscience and Remote Sensing Society","volume":"22 ","pages":"1-5"},"PeriodicalIF":0.0000,"publicationDate":"2025-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE geoscience and remote sensing letters : a publication of the IEEE Geoscience and Remote Sensing Society","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/11052253/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Convolutional neural network (CNN) and transformer-based hybrid models have been successfully applied to hyperspectral image (HSI) classification, enhancing the local feature extraction capability of single transformer-based models. However, these transformers in the hybrid models suffer from structural redundancy in components such as positional encoding (PE) and multilayer perceptron (MLP). To address the issue, we propose a novel attention architecture termed twice attention convolution module and simplified transformer (TAST) for HSI classification. The proposed TAST primarily consists of a twice attention convolution module (TACM) and a simplified transformer (ST). TACM is designed to improve the ability to extract local features. In addition, we introduce the ST by removing the PE and MLP components from the original transformer, which captures long-range dependencies while simplifying the structure of the original transformer. Experimental results on four public datasets demonstrate that the proposed TAST model outperforms both state-of-the-art CNN and transformer models in terms of classification performance, with improvements in terms of overall accuracy (OA) around 3.87%–34.95% (Indian Pines), 0.35%–23.43% (Salinas), 0.37%–6.05% (WHU-Hi-LongKou), and 0.65%–10.79% (WHU-Hi-HongHu).

查看原文本刊更多论文

基于二次注意卷积和简化变换的高光谱图像分类注意结构

将卷积神经网络（CNN）和基于变压器的混合模型成功应用于高光谱图像分类，增强了基于单一变压器模型的局部特征提取能力。然而，这些混合模型中的变压器在位置编码（PE）和多层感知器（MLP）等组件中存在结构冗余。为了解决这个问题，我们提出了一种新的注意力架构，称为二次注意力卷积模块和简化变压器（TAST）用于HSI分类。提出的TAST主要由二次注意卷积模块（TACM）和简化的变压器（ST）组成。TACM旨在提高提取局部特征的能力。此外，我们通过从原始变压器中去除PE和MLP组件来引入ST，这在简化原始变压器结构的同时捕获了远程依赖关系。在4个公开数据集上的实验结果表明，本文提出的TAST模型在分类性能上优于最先进的CNN模型和变压器模型，总体准确率（OA）分别提高了3.87% ~ 34.95% （Indian Pines）、0.35% ~ 23.43% （Salinas）、0.37% ~ 6.05% （WHU-Hi-LongKou）和0.65% ~ 10.79% （WHU-Hi-HongHu）。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE geoscience and remote sensing letters : a publication of the IEEE Geoscience and Remote Sensing Society

自引率

0.00%

发文量