Deep multimodal emotion recognition using modality-aware attention and proxy-based multimodal loss

IF 6 3区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

Internet of Things Pub Date : 2025-03-10 DOI:10.1016/j.iot.2025.101562

Sungpil Woo , Muhammad Zubair , Sunhwan Lim , Daeyoung Kim

{"title":"Deep multimodal emotion recognition using modality-aware attention and proxy-based multimodal loss","authors":"Sungpil Woo , Muhammad Zubair , Sunhwan Lim , Daeyoung Kim","doi":"10.1016/j.iot.2025.101562","DOIUrl":null,"url":null,"abstract":"<div><div>Emotion recognition based on physiological signals has garnered significant attention across various fields, including affective computing, health, virtual reality, robotics, and content rating. Recent advancements in technology have led to the development of multi-modal bio-sensing systems that enhanced the data collection efficiency by simultaneously recording and tracking multiple bio-signals. However, integrating multiple physiological signals for emotion recognition presents significant challenges due to the fusion of diverse data types. Differences in signal characteristics and noise levels significantly deteriorate the classification performance of a multi-modal system and therefore require effective feature extraction and fusion techniques to combine the most informative features from each modality without causing feature conflict. To this end, this study introduces a novel multi-modal emotion recognition method that addresses these challenges by leveraging electroencephalogram and electrocardiogram data to classify different levels of arousal and valence. The proposed deep multimodal architecture exploits a novel modality-aware attention mechanism to highlight mutually important and emotion-specific features. Additionally, a novel proxy-based multimodal loss function is employed for supervision during training to ensure the constructive contribution of each modality while preserving their unique characteristics. By addressing the critical issues of multi-modal signal fusion and emotion-specific feature extraction, the proposed multimodal architecture learns a constructive and complementary representation of multiple physiological signals and thus significantly improves the performance of emotion recognition systems. Through a series of experiments and visualizations conducted on the AMIGOS dataset, we demonstrate the efficacy of our proposed methodology for emotion classification.</div></div>","PeriodicalId":29968,"journal":{"name":"Internet of Things","volume":"31 ","pages":"Article 101562"},"PeriodicalIF":6.0000,"publicationDate":"2025-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Internet of Things","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2542660525000757","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Emotion recognition based on physiological signals has garnered significant attention across various fields, including affective computing, health, virtual reality, robotics, and content rating. Recent advancements in technology have led to the development of multi-modal bio-sensing systems that enhanced the data collection efficiency by simultaneously recording and tracking multiple bio-signals. However, integrating multiple physiological signals for emotion recognition presents significant challenges due to the fusion of diverse data types. Differences in signal characteristics and noise levels significantly deteriorate the classification performance of a multi-modal system and therefore require effective feature extraction and fusion techniques to combine the most informative features from each modality without causing feature conflict. To this end, this study introduces a novel multi-modal emotion recognition method that addresses these challenges by leveraging electroencephalogram and electrocardiogram data to classify different levels of arousal and valence. The proposed deep multimodal architecture exploits a novel modality-aware attention mechanism to highlight mutually important and emotion-specific features. Additionally, a novel proxy-based multimodal loss function is employed for supervision during training to ensure the constructive contribution of each modality while preserving their unique characteristics. By addressing the critical issues of multi-modal signal fusion and emotion-specific feature extraction, the proposed multimodal architecture learns a constructive and complementary representation of multiple physiological signals and thus significantly improves the performance of emotion recognition systems. Through a series of experiments and visualizations conducted on the AMIGOS dataset, we demonstrate the efficacy of our proposed methodology for emotion classification.

查看原文本刊更多论文

基于情态感知注意和基于代理的多模态损失的深度多模态情感识别

基于生理信号的情绪识别在情感计算、健康、虚拟现实、机器人和内容评级等各个领域都受到了极大的关注。最近技术的进步导致了多模态生物传感系统的发展，通过同时记录和跟踪多个生物信号来提高数据收集效率。然而，由于多种数据类型的融合，整合多种生理信号进行情绪识别面临着巨大的挑战。信号特征和噪声水平的差异会显著降低多模态系统的分类性能，因此需要有效的特征提取和融合技术，以便在不造成特征冲突的情况下，将每个模态中信息量最大的特征结合起来。为此，本研究引入了一种新的多模态情绪识别方法，通过利用脑电图和心电图数据对不同水平的唤醒和价态进行分类，解决了这些挑战。所提出的深度多模态架构利用一种新颖的模态感知注意力机制来突出相互重要的和特定于情感的特征。此外，在训练过程中，采用一种新的基于代理的多模态损失函数进行监督，以确保每个模态的建设性贡献，同时保持其独特性。通过解决多模态信号融合和情感特征提取的关键问题，所提出的多模态架构学习了多种生理信号的建设性和互补表示，从而显著提高了情感识别系统的性能。通过在AMIGOS数据集上进行的一系列实验和可视化，我们证明了我们提出的情绪分类方法的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Internet of Things Multiple-

CiteScore

3.60

自引率

5.10%

发文量

115

审稿时长

37 days

期刊介绍： Internet of Things; Engineering Cyber Physical Human Systems is a comprehensive journal encouraging cross collaboration between researchers, engineers and practitioners in the field of IoT & Cyber Physical Human Systems. The journal offers a unique platform to exchange scientific information on the entire breadth of technology, science, and societal applications of the IoT. The journal will place a high priority on timely publication, and provide a home for high quality. Furthermore, IOT is interested in publishing topical Special Issues on any aspect of IOT.