基于消费类设备的情感联合多尺度多模态变压器

IF 4.3 2区计算机科学 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE Transactions on Consumer Electronics Pub Date : 2025-01-21 DOI:10.1109/TCE.2025.3532322

Mustaqeem Khan;Jamil Ahmad;Wail Gueaieb;Giulia De Masi;Fakhri Karray;Abdulmotaleb El Saddik

{"title":"基于消费类设备的情感联合多尺度多模态变压器","authors":"Mustaqeem Khan;Jamil Ahmad;Wail Gueaieb;Giulia De Masi;Fakhri Karray;Abdulmotaleb El Saddik","doi":"10.1109/TCE.2025.3532322","DOIUrl":null,"url":null,"abstract":"The field of Multimodal Emotion Recognition (MER) has made considerable advancements in recent years; however, the opportunity to leverage the synergistic relationships between different modalities remains largely untapped. This paper introduces an MER approach employing a Joint Multi-Scale Multimodal Transformer (JMMT) with recursive cross-attention for naturalistic recognition of emotions by enhancing and capturing inter- and intra-modal relationships across both (visual and audio) modalities. We compute multi-scale attention weights based on cross-correlations between multi-scale joint representations of combined and individual cues to capture inter and intra-modal dynamics. As a result of individual modalities, recursive inputs are fed back during the fusion for further refinement of features. Our JMMT model presents a cost-effective solution for consumer devices by capturing synergistic characteristics across visual and audio inputs. The JMMT model outperforms the state-of-the-art (SOTA) methods in MER systems, which were evaluated by IEMOCAP and MELD datasets.","PeriodicalId":13208,"journal":{"name":"IEEE Transactions on Consumer Electronics","volume":"71 1","pages":"1092-1101"},"PeriodicalIF":4.3000,"publicationDate":"2025-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Joint Multi-Scale Multimodal Transformer for Emotion Using Consumer Devices\",\"authors\":\"Mustaqeem Khan;Jamil Ahmad;Wail Gueaieb;Giulia De Masi;Fakhri Karray;Abdulmotaleb El Saddik\",\"doi\":\"10.1109/TCE.2025.3532322\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The field of Multimodal Emotion Recognition (MER) has made considerable advancements in recent years; however, the opportunity to leverage the synergistic relationships between different modalities remains largely untapped. This paper introduces an MER approach employing a Joint Multi-Scale Multimodal Transformer (JMMT) with recursive cross-attention for naturalistic recognition of emotions by enhancing and capturing inter- and intra-modal relationships across both (visual and audio) modalities. We compute multi-scale attention weights based on cross-correlations between multi-scale joint representations of combined and individual cues to capture inter and intra-modal dynamics. As a result of individual modalities, recursive inputs are fed back during the fusion for further refinement of features. Our JMMT model presents a cost-effective solution for consumer devices by capturing synergistic characteristics across visual and audio inputs. The JMMT model outperforms the state-of-the-art (SOTA) methods in MER systems, which were evaluated by IEMOCAP and MELD datasets.\",\"PeriodicalId\":13208,\"journal\":{\"name\":\"IEEE Transactions on Consumer Electronics\",\"volume\":\"71 1\",\"pages\":\"1092-1101\"},\"PeriodicalIF\":4.3000,\"publicationDate\":\"2025-01-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Consumer Electronics\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10848157/\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Consumer Electronics","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10848157/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

摘要

近年来，多模态情感识别（MER）领域取得了长足的进步；然而，利用不同模式之间的协同关系的机会在很大程度上仍未得到开发。本文介绍了一种基于递归交叉注意的联合多尺度多模态转换器（JMMT）的情感识别方法，该方法通过增强和捕获（视觉和音频）模态间和模态内的关系来实现情感的自然识别。我们基于组合线索和单个线索的多尺度联合表示之间的相互关联计算多尺度注意权重，以捕获模态间和模态内动态。由于单个模态，在融合过程中反馈递归输入以进一步细化特征。我们的JMMT模型通过捕捉视觉和音频输入之间的协同特性，为消费设备提供了一种经济高效的解决方案。通过IEMOCAP和MELD数据集评估，JMMT模型优于MER系统中最先进的（SOTA）方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Joint Multi-Scale Multimodal Transformer for Emotion Using Consumer Devices

The field of Multimodal Emotion Recognition (MER) has made considerable advancements in recent years; however, the opportunity to leverage the synergistic relationships between different modalities remains largely untapped. This paper introduces an MER approach employing a Joint Multi-Scale Multimodal Transformer (JMMT) with recursive cross-attention for naturalistic recognition of emotions by enhancing and capturing inter- and intra-modal relationships across both (visual and audio) modalities. We compute multi-scale attention weights based on cross-correlations between multi-scale joint representations of combined and individual cues to capture inter and intra-modal dynamics. As a result of individual modalities, recursive inputs are fed back during the fusion for further refinement of features. Our JMMT model presents a cost-effective solution for consumer devices by capturing synergistic characteristics across visual and audio inputs. The JMMT model outperforms the state-of-the-art (SOTA) methods in MER systems, which were evaluated by IEMOCAP and MELD datasets.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Transactions on Consumer Electronics 工程技术-电信学

CiteScore

7.70

自引率

9.30%

发文量

审稿时长

3.3 months

期刊介绍： The main focus for the IEEE Transactions on Consumer Electronics is the engineering and research aspects of the theory, design, construction, manufacture or end use of mass market electronics, systems, software and services for consumers.