{"title":"MSC-transformer-based 3D-attention with knowledge distillation for multi-action classification of separate lower limbs","authors":"Heng Yan , Zilu Wang , Junhua Li","doi":"10.1016/j.neunet.2025.107806","DOIUrl":null,"url":null,"abstract":"<div><div>Deep learning has been extensively applied to motor imagery (MI) classification using electroencephalogram (EEG). However, most existing deep learning models do not extract features from EEG using dimension-specific attention mechanisms based on the characteristics of each dimension (e.g., spatial dimension), while effectively integrate local and global features. Furthermore, implicit information generated by the models has been ignored, leading to underutilization of essential information of EEG. Although MI classification has been relatively thoroughly investigated, the exploration of classification including real movement (RM) and motor observation (MO) is very limited, especially for separate lower limbs. To address the above problems and limitations, we proposed a multi-scale separable convolutional Transformer-based filter-spatial-temporal attention model (MSC-T3AM) to classify multiple lower limb actions. In MSC-T3AM, spatial attention, filter and temporal attention modules are embedded to allocate appropriate attention to each dimension. Multi-scale separable convolutions (MSC) are separately applied after the projections of query, key, and value in self-attention module to improve computational efficiency and classification performance. Furthermore, knowledge distillation (KD) was utilized to help model learn suitable probability distribution. The comparison results demonstrated that MSC-T3AM with online KD achieved best performance in classification accuracy, exhibiting an elevation of 2 %-19 % compared to a few counterpart models. The visualization of features extracted by MSC-T3AM with online KD reiterated the superiority of the proposed model. The ablation results showed that filter and temporal attention modules contributed most for performance improvement (improved by 2.8 %), followed by spatial attention module (1.2 %) and MSC module (1 %). Our study also suggested that online KD was better than offline KD and the case without KD. The code of MSC-T3AM is available at: <span><span>https://github.com/BICN001/MSC-T3AM</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"191 ","pages":"Article 107806"},"PeriodicalIF":6.3000,"publicationDate":"2025-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neural Networks","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0893608025006860","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Deep learning has been extensively applied to motor imagery (MI) classification using electroencephalogram (EEG). However, most existing deep learning models do not extract features from EEG using dimension-specific attention mechanisms based on the characteristics of each dimension (e.g., spatial dimension), while effectively integrate local and global features. Furthermore, implicit information generated by the models has been ignored, leading to underutilization of essential information of EEG. Although MI classification has been relatively thoroughly investigated, the exploration of classification including real movement (RM) and motor observation (MO) is very limited, especially for separate lower limbs. To address the above problems and limitations, we proposed a multi-scale separable convolutional Transformer-based filter-spatial-temporal attention model (MSC-T3AM) to classify multiple lower limb actions. In MSC-T3AM, spatial attention, filter and temporal attention modules are embedded to allocate appropriate attention to each dimension. Multi-scale separable convolutions (MSC) are separately applied after the projections of query, key, and value in self-attention module to improve computational efficiency and classification performance. Furthermore, knowledge distillation (KD) was utilized to help model learn suitable probability distribution. The comparison results demonstrated that MSC-T3AM with online KD achieved best performance in classification accuracy, exhibiting an elevation of 2 %-19 % compared to a few counterpart models. The visualization of features extracted by MSC-T3AM with online KD reiterated the superiority of the proposed model. The ablation results showed that filter and temporal attention modules contributed most for performance improvement (improved by 2.8 %), followed by spatial attention module (1.2 %) and MSC module (1 %). Our study also suggested that online KD was better than offline KD and the case without KD. The code of MSC-T3AM is available at: https://github.com/BICN001/MSC-T3AM.
期刊介绍:
Neural Networks is a platform that aims to foster an international community of scholars and practitioners interested in neural networks, deep learning, and other approaches to artificial intelligence and machine learning. Our journal invites submissions covering various aspects of neural networks research, from computational neuroscience and cognitive modeling to mathematical analyses and engineering applications. By providing a forum for interdisciplinary discussions between biology and technology, we aim to encourage the development of biologically-inspired artificial intelligence.