基于消费类设备的情感联合多尺度多模态变压器

IF 4.3 2区 计算机科学 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC
Mustaqeem Khan;Jamil Ahmad;Wail Gueaieb;Giulia De Masi;Fakhri Karray;Abdulmotaleb El Saddik
{"title":"基于消费类设备的情感联合多尺度多模态变压器","authors":"Mustaqeem Khan;Jamil Ahmad;Wail Gueaieb;Giulia De Masi;Fakhri Karray;Abdulmotaleb El Saddik","doi":"10.1109/TCE.2025.3532322","DOIUrl":null,"url":null,"abstract":"The field of Multimodal Emotion Recognition (MER) has made considerable advancements in recent years; however, the opportunity to leverage the synergistic relationships between different modalities remains largely untapped. This paper introduces an MER approach employing a Joint Multi-Scale Multimodal Transformer (JMMT) with recursive cross-attention for naturalistic recognition of emotions by enhancing and capturing inter- and intra-modal relationships across both (visual and audio) modalities. We compute multi-scale attention weights based on cross-correlations between multi-scale joint representations of combined and individual cues to capture inter and intra-modal dynamics. As a result of individual modalities, recursive inputs are fed back during the fusion for further refinement of features. Our JMMT model presents a cost-effective solution for consumer devices by capturing synergistic characteristics across visual and audio inputs. The JMMT model outperforms the state-of-the-art (SOTA) methods in MER systems, which were evaluated by IEMOCAP and MELD datasets.","PeriodicalId":13208,"journal":{"name":"IEEE Transactions on Consumer Electronics","volume":"71 1","pages":"1092-1101"},"PeriodicalIF":4.3000,"publicationDate":"2025-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Joint Multi-Scale Multimodal Transformer for Emotion Using Consumer Devices\",\"authors\":\"Mustaqeem Khan;Jamil Ahmad;Wail Gueaieb;Giulia De Masi;Fakhri Karray;Abdulmotaleb El Saddik\",\"doi\":\"10.1109/TCE.2025.3532322\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The field of Multimodal Emotion Recognition (MER) has made considerable advancements in recent years; however, the opportunity to leverage the synergistic relationships between different modalities remains largely untapped. This paper introduces an MER approach employing a Joint Multi-Scale Multimodal Transformer (JMMT) with recursive cross-attention for naturalistic recognition of emotions by enhancing and capturing inter- and intra-modal relationships across both (visual and audio) modalities. We compute multi-scale attention weights based on cross-correlations between multi-scale joint representations of combined and individual cues to capture inter and intra-modal dynamics. As a result of individual modalities, recursive inputs are fed back during the fusion for further refinement of features. Our JMMT model presents a cost-effective solution for consumer devices by capturing synergistic characteristics across visual and audio inputs. The JMMT model outperforms the state-of-the-art (SOTA) methods in MER systems, which were evaluated by IEMOCAP and MELD datasets.\",\"PeriodicalId\":13208,\"journal\":{\"name\":\"IEEE Transactions on Consumer Electronics\",\"volume\":\"71 1\",\"pages\":\"1092-1101\"},\"PeriodicalIF\":4.3000,\"publicationDate\":\"2025-01-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Consumer Electronics\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10848157/\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Consumer Electronics","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10848157/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0

摘要

近年来,多模态情感识别(MER)领域取得了长足的进步;然而,利用不同模式之间的协同关系的机会在很大程度上仍未得到开发。本文介绍了一种基于递归交叉注意的联合多尺度多模态转换器(JMMT)的情感识别方法,该方法通过增强和捕获(视觉和音频)模态间和模态内的关系来实现情感的自然识别。我们基于组合线索和单个线索的多尺度联合表示之间的相互关联计算多尺度注意权重,以捕获模态间和模态内动态。由于单个模态,在融合过程中反馈递归输入以进一步细化特征。我们的JMMT模型通过捕捉视觉和音频输入之间的协同特性,为消费设备提供了一种经济高效的解决方案。通过IEMOCAP和MELD数据集评估,JMMT模型优于MER系统中最先进的(SOTA)方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Joint Multi-Scale Multimodal Transformer for Emotion Using Consumer Devices
The field of Multimodal Emotion Recognition (MER) has made considerable advancements in recent years; however, the opportunity to leverage the synergistic relationships between different modalities remains largely untapped. This paper introduces an MER approach employing a Joint Multi-Scale Multimodal Transformer (JMMT) with recursive cross-attention for naturalistic recognition of emotions by enhancing and capturing inter- and intra-modal relationships across both (visual and audio) modalities. We compute multi-scale attention weights based on cross-correlations between multi-scale joint representations of combined and individual cues to capture inter and intra-modal dynamics. As a result of individual modalities, recursive inputs are fed back during the fusion for further refinement of features. Our JMMT model presents a cost-effective solution for consumer devices by capturing synergistic characteristics across visual and audio inputs. The JMMT model outperforms the state-of-the-art (SOTA) methods in MER systems, which were evaluated by IEMOCAP and MELD datasets.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
7.70
自引率
9.30%
发文量
59
审稿时长
3.3 months
期刊介绍: The main focus for the IEEE Transactions on Consumer Electronics is the engineering and research aspects of the theory, design, construction, manufacture or end use of mass market electronics, systems, software and services for consumers.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信