面向情感分析的半监督模态关联感知多模态转换器

IF 9.8 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE Transactions on Affective Computing Pub Date : 2025-04-10 DOI:10.1109/TAFFC.2025.3559866

Yangmin Li;Ruiqi Zhu;Wengen Li

{"title":"面向情感分析的半监督模态关联感知多模态转换器","authors":"Yangmin Li;Ruiqi Zhu;Wengen Li","doi":"10.1109/TAFFC.2025.3559866","DOIUrl":null,"url":null,"abstract":"Multimodal sentiment analysis is an active research area that combines multiple data modalities, e.g., text, image and audio, to analyze human emotions, and benefits a variety of applications. Existing multimodal sentiment analysis methods can be roughly classified as modality interaction-based methods, modality transformation-based methods and modality similarity-based methods. However, most of these methods highly rely on the strong correlations between modalities, and cannot fully uncover and utilize the correlations between modalities to enhance sentiment analysis. Therefore, these methods usually achieve unsatisfactory performance for identifying the sentiment of multimodal data with weak correlations. To address this issue, we proposed a two-stage semi-supervised model termed Correlation-aware Multimodal Transformer (CorMulT) which consists of pre-training stage and prediction stage. At the pre-training stage, a modality correlation contrastive learning module is designed to efficiently learn modality correlation coefficients between different modalities. At the prediction stage, the learned correlation coefficients are fused with modality representations to make the sentiment prediction. According to the experiments on the popular multimodal dataset CMU-MOSEI, CorMulT obviously surpasses the state-of-the-art multimodal sentiment analysis methods.","PeriodicalId":13131,"journal":{"name":"IEEE Transactions on Affective Computing","volume":"16 3","pages":"2321-2333"},"PeriodicalIF":9.8000,"publicationDate":"2025-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"CorMulT: A Semi-Supervised Modality Correlation-Aware Multimodal Transformer for Sentiment Analysis\",\"authors\":\"Yangmin Li;Ruiqi Zhu;Wengen Li\",\"doi\":\"10.1109/TAFFC.2025.3559866\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Multimodal sentiment analysis is an active research area that combines multiple data modalities, e.g., text, image and audio, to analyze human emotions, and benefits a variety of applications. Existing multimodal sentiment analysis methods can be roughly classified as modality interaction-based methods, modality transformation-based methods and modality similarity-based methods. However, most of these methods highly rely on the strong correlations between modalities, and cannot fully uncover and utilize the correlations between modalities to enhance sentiment analysis. Therefore, these methods usually achieve unsatisfactory performance for identifying the sentiment of multimodal data with weak correlations. To address this issue, we proposed a two-stage semi-supervised model termed Correlation-aware Multimodal Transformer (CorMulT) which consists of pre-training stage and prediction stage. At the pre-training stage, a modality correlation contrastive learning module is designed to efficiently learn modality correlation coefficients between different modalities. At the prediction stage, the learned correlation coefficients are fused with modality representations to make the sentiment prediction. According to the experiments on the popular multimodal dataset CMU-MOSEI, CorMulT obviously surpasses the state-of-the-art multimodal sentiment analysis methods.\",\"PeriodicalId\":13131,\"journal\":{\"name\":\"IEEE Transactions on Affective Computing\",\"volume\":\"16 3\",\"pages\":\"2321-2333\"},\"PeriodicalIF\":9.8000,\"publicationDate\":\"2025-04-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Affective Computing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10960754/\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Affective Computing","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10960754/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

多模态情感分析是一个活跃的研究领域，它结合了多种数据模式，如文本、图像和音频，来分析人类的情感，并有利于各种应用。现有的多模态情感分析方法大致可分为基于情态交互的方法、基于情态转换的方法和基于情态相似度的方法。然而，这些方法大多高度依赖于模态之间的强相关性，不能充分揭示和利用模态之间的相关性来增强情感分析。因此，这些方法在识别具有弱相关性的多模态数据的情感时，通常表现不理想。为了解决这一问题，我们提出了一种两阶段半监督模型，称为关联感知多模态变压器（CorMulT），该模型由预训练阶段和预测阶段组成。在预训练阶段，设计模态相关对比学习模块，有效学习不同模态之间的模态相关系数。在预测阶段，将学习到的相关系数与情态表示融合，进行情感预测。在流行的多模态数据集CMU-MOSEI上的实验表明，CorMulT明显优于目前最先进的多模态情感分析方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

CorMulT: A Semi-Supervised Modality Correlation-Aware Multimodal Transformer for Sentiment Analysis

Multimodal sentiment analysis is an active research area that combines multiple data modalities, e.g., text, image and audio, to analyze human emotions, and benefits a variety of applications. Existing multimodal sentiment analysis methods can be roughly classified as modality interaction-based methods, modality transformation-based methods and modality similarity-based methods. However, most of these methods highly rely on the strong correlations between modalities, and cannot fully uncover and utilize the correlations between modalities to enhance sentiment analysis. Therefore, these methods usually achieve unsatisfactory performance for identifying the sentiment of multimodal data with weak correlations. To address this issue, we proposed a two-stage semi-supervised model termed Correlation-aware Multimodal Transformer (CorMulT) which consists of pre-training stage and prediction stage. At the pre-training stage, a modality correlation contrastive learning module is designed to efficiently learn modality correlation coefficients between different modalities. At the prediction stage, the learned correlation coefficients are fused with modality representations to make the sentiment prediction. According to the experiments on the popular multimodal dataset CMU-MOSEI, CorMulT obviously surpasses the state-of-the-art multimodal sentiment analysis methods.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Transactions on Affective Computing COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE-COMPUTER SCIENCE, CYBERNETICS

CiteScore

15.00

自引率

6.20%

发文量

174

期刊介绍： The IEEE Transactions on Affective Computing is an international and interdisciplinary journal. Its primary goal is to share research findings on the development of systems capable of recognizing, interpreting, and simulating human emotions and related affective phenomena. The journal publishes original research on the underlying principles and theories that explain how and why affective factors shape human-technology interactions. It also focuses on how techniques for sensing and simulating affect can enhance our understanding of human emotions and processes. Additionally, the journal explores the design, implementation, and evaluation of systems that prioritize the consideration of affect in their usability. We also welcome surveys of existing work that provide new perspectives on the historical and future directions of this field.