Multi-Label Auroral Image Classification Based on CNN and Transformer

IF 13.7

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society Pub Date : 2025-03-17 DOI:10.1109/TIP.2025.3550003

Hang Su;Qiuju Yang;Yixuan Ning;Zejun Hu;Lili Liu

{"title":"Multi-Label Auroral Image Classification Based on CNN and Transformer","authors":"Hang Su;Qiuju Yang;Yixuan Ning;Zejun Hu;Lili Liu","doi":"10.1109/TIP.2025.3550003","DOIUrl":null,"url":null,"abstract":"Auroral image classification has long been a focus of research in auroral physics. However, current methods for automatic auroral classification typically assume that only one type of aurora is present in an auroral image. This oversight neglects the complex transition states and coexistence of multiple types during the auroral evolution process, thus limiting the exploration of the intricate semantics of auroral images. To fully exploit the physical information embedded in auroral images, this paper proposes a multi-label auroral classification method, termed MLAC, which integrates convolutional neural network (CNN) and Transformer architectures. Firstly, we introduce a multi-scale feature fusion framework that enables the model to capture both fine-grained features and high-level information in auroral images, resulting in a more comprehensive representation of auroral features. Secondly, we propose a lightweight multi-head self-attention mechanism that captures long-range dependencies between pixels during the multiscale feature fusion process, which is crucial for distinguishing subtle differences between auroral types. Furthermore, we design a residual focused multilayer perceptron module that integrates large kernel depth-wise convolution with an improved multilayer perceptron. This integration enhances the model’s ability to represent complex spatial structure, thus improving local feature extraction and global contextual understanding. The proposed method achieves a mean average precision (mAP) of 88.20% on the auroral observation data collected at the Yellow River Station from 2003 to 2008. This performance significantly surpasses that of the most advanced multi-label classification models while maintaining competitive computational efficiency. Moreover, our method also outperforms the state-of-the-art multi-label methods in both computational efficiency and classification accuracy on two publicly available multi-label image datasets: WIDER-Attribute and VOC2007. These results demonstrate that our method skillfully leverages the robust feature extraction capability of CNNs for local features and the superior global information processing capability of Transformer.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"1835-1848"},"PeriodicalIF":13.7000,"publicationDate":"2025-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10930642/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Auroral image classification has long been a focus of research in auroral physics. However, current methods for automatic auroral classification typically assume that only one type of aurora is present in an auroral image. This oversight neglects the complex transition states and coexistence of multiple types during the auroral evolution process, thus limiting the exploration of the intricate semantics of auroral images. To fully exploit the physical information embedded in auroral images, this paper proposes a multi-label auroral classification method, termed MLAC, which integrates convolutional neural network (CNN) and Transformer architectures. Firstly, we introduce a multi-scale feature fusion framework that enables the model to capture both fine-grained features and high-level information in auroral images, resulting in a more comprehensive representation of auroral features. Secondly, we propose a lightweight multi-head self-attention mechanism that captures long-range dependencies between pixels during the multiscale feature fusion process, which is crucial for distinguishing subtle differences between auroral types. Furthermore, we design a residual focused multilayer perceptron module that integrates large kernel depth-wise convolution with an improved multilayer perceptron. This integration enhances the model’s ability to represent complex spatial structure, thus improving local feature extraction and global contextual understanding. The proposed method achieves a mean average precision (mAP) of 88.20% on the auroral observation data collected at the Yellow River Station from 2003 to 2008. This performance significantly surpasses that of the most advanced multi-label classification models while maintaining competitive computational efficiency. Moreover, our method also outperforms the state-of-the-art multi-label methods in both computational efficiency and classification accuracy on two publicly available multi-label image datasets: WIDER-Attribute and VOC2007. These results demonstrate that our method skillfully leverages the robust feature extraction capability of CNNs for local features and the superior global information processing capability of Transformer.

查看原文本刊更多论文

基于CNN和Transformer的多标签极光图像分类

极光图像分类一直是极光物理学研究的热点。然而，目前的自动极光分类方法通常假设在一张极光图像中只有一种类型的极光。这种疏忽忽略了极光演化过程中复杂的过渡状态和多种类型的共存，从而限制了对极光图像复杂语义的探索。为了充分利用极光图像中的物理信息，本文提出了一种融合卷积神经网络（CNN）和Transformer架构的多标签极光分类方法MLAC。首先，我们引入了一个多尺度特征融合框架，使模型能够同时捕获极光图像中的细粒度特征和高级信息，从而更全面地表示极光特征。其次，我们提出了一种轻量级的多头自注意机制，该机制在多尺度特征融合过程中捕获像素之间的远程依赖关系，这对于区分极光类型之间的细微差异至关重要。此外，我们设计了一个残差聚焦多层感知器模块，该模块将大核深度卷积与改进的多层感知器集成在一起。这种集成增强了模型表示复杂空间结构的能力，从而改进了局部特征提取和全局上下文理解。该方法对黄河站2003 ~ 2008年极光观测资料的平均精度（mAP）达到88.20%。这种性能明显超过了最先进的多标签分类模型，同时保持了有竞争力的计算效率。此外，我们的方法在两个公开的多标签图像数据集（WIDER-Attribute和VOC2007）上的计算效率和分类精度也优于最先进的多标签方法。这些结果表明，我们的方法巧妙地利用了cnn对局部特征的鲁棒特征提取能力和Transformer优越的全局信息处理能力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society

自引率

0.00%

发文量