面向多模态情感识别的情态情感语义关联分析

IF 4 3区计算机科学 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

Computers & Electrical Engineering Pub Date : 2025-06-10 DOI:10.1016/j.compeleceng.2025.110467

Yuqing Zhang , Dongliang Xie , Dawei Luo , Baosheng Sun

{"title":"面向多模态情感识别的情态情感语义关联分析","authors":"Yuqing Zhang , Dongliang Xie , Dawei Luo , Baosheng Sun","doi":"10.1016/j.compeleceng.2025.110467","DOIUrl":null,"url":null,"abstract":"<div><div>Affective computing serves as the fundamental technology and a crucial prerequisite for attaining naturalized and anthropomorphic human–computer interaction. Nevertheless, the expression of emotion is complex and multi-dimensional, posing significant challenges for multimodal emotion recognition due to the heterogeneity gap among distinct modalities. To tackle this issue, we propose a novel approach named modality emotion semantic correlation analysis (MESCA), which enhances multimodal affective semantic consistency by leveraging modality correlation learning to achieve multimodal information complementation. Specifically, we first design a modal-pair correlation module that calculates emotion semantic consistency across text, audio and video information. This module contributes to a comprehensive understanding of the emotional state by fusing complementary semantic information and assists in mitigating redundancy in pairwise interaction methods. Next, we introduce structural re-parameterization technology that transforms the multi-branch training structure into a single-branch inference structure to solve the problem of excessive computational expense, thereby facilitating a more efficient and effective recognition process. Additionally, the proposed model is verified on two public datasets, IEMOCAP and CMU-MOSEI. Compared to baseline methods, MESCA significantly enhances efficiency while maintaining prediction accuracy on IEMOCAP, and outperforms on both efficiency and accuracy on CMU-MOSEI.</div></div>","PeriodicalId":50630,"journal":{"name":"Computers & Electrical Engineering","volume":"126 ","pages":"Article 110467"},"PeriodicalIF":4.0000,"publicationDate":"2025-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Modality emotion semantic correlation analysis for multimodal emotion recognition\",\"authors\":\"Yuqing Zhang , Dongliang Xie , Dawei Luo , Baosheng Sun\",\"doi\":\"10.1016/j.compeleceng.2025.110467\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Affective computing serves as the fundamental technology and a crucial prerequisite for attaining naturalized and anthropomorphic human–computer interaction. Nevertheless, the expression of emotion is complex and multi-dimensional, posing significant challenges for multimodal emotion recognition due to the heterogeneity gap among distinct modalities. To tackle this issue, we propose a novel approach named modality emotion semantic correlation analysis (MESCA), which enhances multimodal affective semantic consistency by leveraging modality correlation learning to achieve multimodal information complementation. Specifically, we first design a modal-pair correlation module that calculates emotion semantic consistency across text, audio and video information. This module contributes to a comprehensive understanding of the emotional state by fusing complementary semantic information and assists in mitigating redundancy in pairwise interaction methods. Next, we introduce structural re-parameterization technology that transforms the multi-branch training structure into a single-branch inference structure to solve the problem of excessive computational expense, thereby facilitating a more efficient and effective recognition process. Additionally, the proposed model is verified on two public datasets, IEMOCAP and CMU-MOSEI. Compared to baseline methods, MESCA significantly enhances efficiency while maintaining prediction accuracy on IEMOCAP, and outperforms on both efficiency and accuracy on CMU-MOSEI.</div></div>\",\"PeriodicalId\":50630,\"journal\":{\"name\":\"Computers & Electrical Engineering\",\"volume\":\"126 \",\"pages\":\"Article 110467\"},\"PeriodicalIF\":4.0000,\"publicationDate\":\"2025-06-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computers & Electrical Engineering\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0045790625004100\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers & Electrical Engineering","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0045790625004100","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}

引用次数: 0

摘要

情感计算是实现自然化、拟人化人机交互的基础技术和关键前提。然而，情绪的表达是复杂和多维的，由于不同模式之间的异质性差距，给多模态情绪识别带来了重大挑战。为了解决这一问题，我们提出了一种新的方法——情态情感语义相关分析（MESCA），该方法利用情态相关学习来实现多模态信息互补，从而增强多模态情感语义一致性。具体而言，我们首先设计了一个模态对关联模块，用于计算文本、音频和视频信息之间的情感语义一致性。该模块通过融合互补语义信息，有助于全面理解情绪状态，并有助于减少两两交互方法中的冗余。接下来，我们引入结构重参数化技术，将多分支训练结构转化为单分支推理结构，解决了计算开销过大的问题，从而促进了更高效、更有效的识别过程。此外，在IEMOCAP和CMU-MOSEI两个公共数据集上对该模型进行了验证。与基线方法相比，MESCA在保持IEMOCAP预测精度的同时显著提高了效率，并且在CMU-MOSEI上的效率和精度都优于基线方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Modality emotion semantic correlation analysis for multimodal emotion recognition

Affective computing serves as the fundamental technology and a crucial prerequisite for attaining naturalized and anthropomorphic human–computer interaction. Nevertheless, the expression of emotion is complex and multi-dimensional, posing significant challenges for multimodal emotion recognition due to the heterogeneity gap among distinct modalities. To tackle this issue, we propose a novel approach named modality emotion semantic correlation analysis (MESCA), which enhances multimodal affective semantic consistency by leveraging modality correlation learning to achieve multimodal information complementation. Specifically, we first design a modal-pair correlation module that calculates emotion semantic consistency across text, audio and video information. This module contributes to a comprehensive understanding of the emotional state by fusing complementary semantic information and assists in mitigating redundancy in pairwise interaction methods. Next, we introduce structural re-parameterization technology that transforms the multi-branch training structure into a single-branch inference structure to solve the problem of excessive computational expense, thereby facilitating a more efficient and effective recognition process. Additionally, the proposed model is verified on two public datasets, IEMOCAP and CMU-MOSEI. Compared to baseline methods, MESCA significantly enhances efficiency while maintaining prediction accuracy on IEMOCAP, and outperforms on both efficiency and accuracy on CMU-MOSEI.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Computers & Electrical Engineering 工程技术-工程：电子与电气

CiteScore

9.20

自引率

7.00%

发文量

661

审稿时长

47 days

期刊介绍： The impact of computers has nowhere been more revolutionary than in electrical engineering. The design, analysis, and operation of electrical and electronic systems are now dominated by computers, a transformation that has been motivated by the natural ease of interface between computers and electrical systems, and the promise of spectacular improvements in speed and efficiency. Published since 1973, Computers & Electrical Engineering provides rapid publication of topical research into the integration of computer technology and computational techniques with electrical and electronic systems. The journal publishes papers featuring novel implementations of computers and computational techniques in areas like signal and image processing, high-performance computing, parallel processing, and communications. Special attention will be paid to papers describing innovative architectures, algorithms, and software tools.