Zhifen Guo, Jiao Wang, Bin Zhang, Yating Ku, Fengbin Ma
{"title":"A dual transfer learning method based on 3D-CNN and vision transformer for emotion recognition","authors":"Zhifen Guo, Jiao Wang, Bin Zhang, Yating Ku, Fengbin Ma","doi":"10.1007/s10489-024-05976-z","DOIUrl":null,"url":null,"abstract":"<div><p>In the domain of medical science, emotion recognition based on electroencephalogram (EEG) has been widely used in emotion computing. Despite the prevalence of deep learning in EEG signals analysis, standard convolutional and recurrent neural networks fall short in effectively processing EEG data due to their inherent limitations in capturing global dependencies and addressing the non-linear and unstable characteristics of EEG signals. We propose a dual transfer learning method based on 3D Convolutional Neural Networks (3D-CNN) with a Vision Transformer (ViT) to enhance emotion recognition. This paper aims to utilize 3D-CNN effectively to capture the spatial characteristics of EEG signals and reduce data covariance, extracting shallow features. Additionally, ViT is incorporated to improve the model’s ability to capture long-range dependencies, facilitating deep feature extraction. The methodology involves a two-stage process: initially, the front end of a pre-trained 3D-CNN is employed as a shallow feature extractor to mitigate EEG data covariance and transformer biases, focusing on low-level feature detection. The subsequent stage utilizes ViT as a deep feature extractor, adept at modeling the global aspects of EEG signals and employing attention mechanisms for precise classification. We also present an innovative algorithm for data mapping in transfer learning, ensuring consistent feature representation across both spatio-temporal dimensions. This approach significantly improves global feature processing and long-range dependency detection, with the integration of color channels augmenting the model’s sensitivity to signal variations. In a 10-fold cross-validation experiment on the DEAP, experimental results demonstrate that the proposed method achieves classification accuracies of 92.44<span>\\(\\%\\)</span> and 92.85<span>\\(\\%\\)</span> for the valence and arousal dimensions, and the accuracies of four-class classification across valence and arousal are HVHA: 88.01<span>\\(\\%\\)</span>, HVLA: 88.27<span>\\(\\%\\)</span>, LVHA: 90.89<span>\\(\\%\\)</span>, LVLA: 78.84<span>\\(\\%\\)</span>. Similarly, it achieves an accuracy of 98.69<span>\\(\\%\\)</span> on the SEED. Overall, this methodology not only holds substantial potential in advancing emotion recognition tasks but also contributes to the broader field of affective computing.</p></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"55 2","pages":""},"PeriodicalIF":3.4000,"publicationDate":"2024-12-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Intelligence","FirstCategoryId":"94","ListUrlMain":"https://link.springer.com/article/10.1007/s10489-024-05976-z","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
In the domain of medical science, emotion recognition based on electroencephalogram (EEG) has been widely used in emotion computing. Despite the prevalence of deep learning in EEG signals analysis, standard convolutional and recurrent neural networks fall short in effectively processing EEG data due to their inherent limitations in capturing global dependencies and addressing the non-linear and unstable characteristics of EEG signals. We propose a dual transfer learning method based on 3D Convolutional Neural Networks (3D-CNN) with a Vision Transformer (ViT) to enhance emotion recognition. This paper aims to utilize 3D-CNN effectively to capture the spatial characteristics of EEG signals and reduce data covariance, extracting shallow features. Additionally, ViT is incorporated to improve the model’s ability to capture long-range dependencies, facilitating deep feature extraction. The methodology involves a two-stage process: initially, the front end of a pre-trained 3D-CNN is employed as a shallow feature extractor to mitigate EEG data covariance and transformer biases, focusing on low-level feature detection. The subsequent stage utilizes ViT as a deep feature extractor, adept at modeling the global aspects of EEG signals and employing attention mechanisms for precise classification. We also present an innovative algorithm for data mapping in transfer learning, ensuring consistent feature representation across both spatio-temporal dimensions. This approach significantly improves global feature processing and long-range dependency detection, with the integration of color channels augmenting the model’s sensitivity to signal variations. In a 10-fold cross-validation experiment on the DEAP, experimental results demonstrate that the proposed method achieves classification accuracies of 92.44\(\%\) and 92.85\(\%\) for the valence and arousal dimensions, and the accuracies of four-class classification across valence and arousal are HVHA: 88.01\(\%\), HVLA: 88.27\(\%\), LVHA: 90.89\(\%\), LVLA: 78.84\(\%\). Similarly, it achieves an accuracy of 98.69\(\%\) on the SEED. Overall, this methodology not only holds substantial potential in advancing emotion recognition tasks but also contributes to the broader field of affective computing.
期刊介绍:
With a focus on research in artificial intelligence and neural networks, this journal addresses issues involving solutions of real-life manufacturing, defense, management, government and industrial problems which are too complex to be solved through conventional approaches and require the simulation of intelligent thought processes, heuristics, applications of knowledge, and distributed and parallel processing. The integration of these multiple approaches in solving complex problems is of particular importance.
The journal presents new and original research and technological developments, addressing real and complex issues applicable to difficult problems. It provides a medium for exchanging scientific research and technological achievements accomplished by the international community.