DISD-Net: A Dynamic Interactive Network With Self-Distillation for Cross-Subject Multi-Modal Emotion Recognition

IF 9.7 1区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS
Cheng Cheng;Wenzhe Liu;Xinying Wang;Lin Feng;Ziyu Jia
{"title":"DISD-Net: A Dynamic Interactive Network With Self-Distillation for Cross-Subject Multi-Modal Emotion Recognition","authors":"Cheng Cheng;Wenzhe Liu;Xinying Wang;Lin Feng;Ziyu Jia","doi":"10.1109/TMM.2025.3535344","DOIUrl":null,"url":null,"abstract":"Multi-modal Emotion Recognition (MER) has demonstrated competitive performance in affective computing, owing to synthesizing information from diverse modalities. However, many existing approaches still face unresolved challenges, such as: (i) how to learn compact yet representative features from multi-modal data simultaneously and (ii) how to address differences among subjects and enhance the generalization of the emotion recognition model, given the diverse nature of individual biological signals. To this end, we propose a Dynamic Interactive Network with Self-Distillation (DISD-Net) for cross-subject MER. The DISD-Net incorporates a dynamin interactive module to capture the intra- and inter-modal interactions from multi-modal data. Additionally, to enhance compactness in modal representations, we leverage the soft labels generated by the DISD-Net model as supplemental training guidance. This involves incorporating self-distillation, aiming to transfer the knowledge that the DISD-Net model contains hard and soft labels to each modality. Finally, domain adaptation (DA) is seamlessly integrated into the dynamic interactive and self-distillation components, forming a unified framework to extract subject-invariant multi-modal emotional features. Experimental results indicate that the proposed model achieves a mean accuracy of 75.00% with a standard deviation of 7.68% for the DEAP dataset and a mean accuracy of 65.65% with a standard deviation of 5.08% for the SEED-IV dataset.","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"27 ","pages":"4643-4655"},"PeriodicalIF":9.7000,"publicationDate":"2025-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Multimedia","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10857425/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

Abstract

Multi-modal Emotion Recognition (MER) has demonstrated competitive performance in affective computing, owing to synthesizing information from diverse modalities. However, many existing approaches still face unresolved challenges, such as: (i) how to learn compact yet representative features from multi-modal data simultaneously and (ii) how to address differences among subjects and enhance the generalization of the emotion recognition model, given the diverse nature of individual biological signals. To this end, we propose a Dynamic Interactive Network with Self-Distillation (DISD-Net) for cross-subject MER. The DISD-Net incorporates a dynamin interactive module to capture the intra- and inter-modal interactions from multi-modal data. Additionally, to enhance compactness in modal representations, we leverage the soft labels generated by the DISD-Net model as supplemental training guidance. This involves incorporating self-distillation, aiming to transfer the knowledge that the DISD-Net model contains hard and soft labels to each modality. Finally, domain adaptation (DA) is seamlessly integrated into the dynamic interactive and self-distillation components, forming a unified framework to extract subject-invariant multi-modal emotional features. Experimental results indicate that the proposed model achieves a mean accuracy of 75.00% with a standard deviation of 7.68% for the DEAP dataset and a mean accuracy of 65.65% with a standard deviation of 5.08% for the SEED-IV dataset.
面向跨主题多模态情感识别的自蒸馏动态交互网络
多模态情感识别(MER)在情感计算中表现出竞争力,因为它综合了来自不同模态的信息。然而,许多现有的方法仍然面临着未解决的挑战,例如:(i)如何同时从多模态数据中学习紧凑但具有代表性的特征;(ii)考虑到个体生物信号的多样性,如何解决受试者之间的差异并增强情感识别模型的泛化。为此,我们提出了一个基于自蒸馏的动态交互网络(DISD-Net)。DISD-Net包含一个动态交互模块,用于从多模态数据中捕获模态内和模态间的交互。此外,为了增强模态表示的紧凑性,我们利用由DISD-Net模型生成的软标签作为补充训练指导。这包括结合自蒸馏,目的是将DISD-Net模型包含硬标签和软标签的知识转移到每个模态。最后,将领域自适应(DA)与动态交互和自蒸馏组件无缝集成,形成一个统一的框架来提取主题不变的多模态情感特征。实验结果表明,该模型对DEAP数据集的平均准确率为75.00%,标准差为7.68%;对SEED-IV数据集的平均准确率为65.65%,标准差为5.08%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
IEEE Transactions on Multimedia
IEEE Transactions on Multimedia 工程技术-电信学
CiteScore
11.70
自引率
11.00%
发文量
576
审稿时长
5.5 months
期刊介绍: The IEEE Transactions on Multimedia delves into diverse aspects of multimedia technology and applications, covering circuits, networking, signal processing, systems, software, and systems integration. The scope aligns with the Fields of Interest of the sponsors, ensuring a comprehensive exploration of research in multimedia.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信