{"title":"A multi-modal emotion recognition method considering the contribution and redundancy of channels and the correlation and heterogeneity of modalities","authors":"Yongxuan Wen, Wanzhong Chen","doi":"10.1016/j.measurement.2025.119247","DOIUrl":null,"url":null,"abstract":"<div><div>Physiological signals could reflect individual true emotional state, and emotion recognition based on physiological signals is significant in the field of artificial intelligence. However, current multimodal emotion recognition methods used full channels, leading to data redundancy and hardware complexity, causing a waste of computing resources. In addition, existing feature fusion methods generally adopted a direct connection approach, lacking of mid-level alignment and interaction, which cannot effectively extract complementary features from multimodal information, thus affecting classification accuracy. To address the above-mentioned issues, this paper proposed a multimodal emotion recognition method based on both electroencephalogram signals (EEG) and peripheral physiological signals (PPS). First, we introduced a triple-weighted ReliefF-NMI channel selection (TWRNCS) to select channels for EEG signals where the triple weight of subject-feature-frequency band were considered, and the contribution and redundancy of EEG channels are screened in two stages. Secondly, we designed an adaptive feature extractor capable of automatically exacting features from multi-channel EEG and PPS. Additionally, we proposed a cross-modal hybrid attention module (CHAM) based on self-attention and cross-attention mechanisms, including intra-modality private pipelines and inter-modality common pipelines. The private pipelines used self-attention mechanisms to retain heterogeneous information of modalities, while the common pipelines used cross-attention and self-attention mechanisms to capture cross-modal correlations. Finally, the information from different modalities was fully integrated for classification. The experiments demonstrated that our model achieved accuracy of over 98% on the DEAP and MAHNOB-HCI datasets, which proved the superiority of this paper in emotion recognition tasks.</div></div>","PeriodicalId":18349,"journal":{"name":"Measurement","volume":"258 ","pages":"Article 119247"},"PeriodicalIF":5.6000,"publicationDate":"2025-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Measurement","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0263224125026065","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0
Abstract
Physiological signals could reflect individual true emotional state, and emotion recognition based on physiological signals is significant in the field of artificial intelligence. However, current multimodal emotion recognition methods used full channels, leading to data redundancy and hardware complexity, causing a waste of computing resources. In addition, existing feature fusion methods generally adopted a direct connection approach, lacking of mid-level alignment and interaction, which cannot effectively extract complementary features from multimodal information, thus affecting classification accuracy. To address the above-mentioned issues, this paper proposed a multimodal emotion recognition method based on both electroencephalogram signals (EEG) and peripheral physiological signals (PPS). First, we introduced a triple-weighted ReliefF-NMI channel selection (TWRNCS) to select channels for EEG signals where the triple weight of subject-feature-frequency band were considered, and the contribution and redundancy of EEG channels are screened in two stages. Secondly, we designed an adaptive feature extractor capable of automatically exacting features from multi-channel EEG and PPS. Additionally, we proposed a cross-modal hybrid attention module (CHAM) based on self-attention and cross-attention mechanisms, including intra-modality private pipelines and inter-modality common pipelines. The private pipelines used self-attention mechanisms to retain heterogeneous information of modalities, while the common pipelines used cross-attention and self-attention mechanisms to capture cross-modal correlations. Finally, the information from different modalities was fully integrated for classification. The experiments demonstrated that our model achieved accuracy of over 98% on the DEAP and MAHNOB-HCI datasets, which proved the superiority of this paper in emotion recognition tasks.
期刊介绍:
Contributions are invited on novel achievements in all fields of measurement and instrumentation science and technology. Authors are encouraged to submit novel material, whose ultimate goal is an advancement in the state of the art of: measurement and metrology fundamentals, sensors, measurement instruments, measurement and estimation techniques, measurement data processing and fusion algorithms, evaluation procedures and methodologies for plants and industrial processes, performance analysis of systems, processes and algorithms, mathematical models for measurement-oriented purposes, distributed measurement systems in a connected world.