SDRS: Sentiment-Aware Disentangled Representation Shifting for Multimodal Sentiment Analysis

IF 9.8 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE Transactions on Affective Computing Pub Date : 2025-02-06 DOI:10.1109/TAFFC.2025.3539225

Sicheng Zhao;Zhenhua Yang;Henglin Shi;Xiaocheng Feng;Lingpengkun Meng;Bing Qin;Chenggang Yan;Jianhua Tao;Guiguang Ding

{"title":"SDRS: Sentiment-Aware Disentangled Representation Shifting for Multimodal Sentiment Analysis","authors":"Sicheng Zhao;Zhenhua Yang;Henglin Shi;Xiaocheng Feng;Lingpengkun Meng;Bing Qin;Chenggang Yan;Jianhua Tao;Guiguang Ding","doi":"10.1109/TAFFC.2025.3539225","DOIUrl":null,"url":null,"abstract":"Multimodal sentiment analysis (MSA) aims to leverage the complementary information from multiple modalities for affective understanding of user-generated videos. Existing methods mainly focused on designing sophisticated feature fusion strategies to integrate the separately extracted multimodal representations, ignoring the interference of the information irrelevant to sentiment. In this paper, we propose to disentangle the unimodal representations into sentiment-specific and sentiment-independent features, the former of which are fused for the MSA task. Specifically, we design a novel Sentiment-aware Disentangled Representation Shifting framework, termed SDRS, with two components. <bold>Interactive sentiment-aware representation disentanglement</b> aims to extract sentiment-specific feature representations for each nonverbal modality by considering the contextual influence of other modalities with the newly developed cross-attention autoencoder. <bold>Attentive cross-modal representation shifting</b> tries to shift the textual representation in a latent token space using the nonverbal sentiment-specific representations after projection. The shifted representation is finally employed to fine-tune a pre-trained language model for multimodal sentiment analysis. Extensive experiments are conducted on three public benchmark datasets, i.e., CMU-MOSI, CMU-MOSEI, and CH-SIMS. The results demonstrate that the proposed SDRS framework not only obtains state-of-the-art results based solely on multimodal labels but also outperforms the methods that additionally require the labels of each modality.","PeriodicalId":13131,"journal":{"name":"IEEE Transactions on Affective Computing","volume":"16 3","pages":"1802-1813"},"PeriodicalIF":9.8000,"publicationDate":"2025-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Affective Computing","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10876597/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Multimodal sentiment analysis (MSA) aims to leverage the complementary information from multiple modalities for affective understanding of user-generated videos. Existing methods mainly focused on designing sophisticated feature fusion strategies to integrate the separately extracted multimodal representations, ignoring the interference of the information irrelevant to sentiment. In this paper, we propose to disentangle the unimodal representations into sentiment-specific and sentiment-independent features, the former of which are fused for the MSA task. Specifically, we design a novel Sentiment-aware Disentangled Representation Shifting framework, termed SDRS, with two components. Interactive sentiment-aware representation disentanglement aims to extract sentiment-specific feature representations for each nonverbal modality by considering the contextual influence of other modalities with the newly developed cross-attention autoencoder. Attentive cross-modal representation shifting tries to shift the textual representation in a latent token space using the nonverbal sentiment-specific representations after projection. The shifted representation is finally employed to fine-tune a pre-trained language model for multimodal sentiment analysis. Extensive experiments are conducted on three public benchmark datasets, i.e., CMU-MOSI, CMU-MOSEI, and CH-SIMS. The results demonstrate that the proposed SDRS framework not only obtains state-of-the-art results based solely on multimodal labels but also outperforms the methods that additionally require the labels of each modality.

查看原文本刊更多论文

多模态情感分析的情感感知解纠缠表征转移

多模态情感分析（MSA）旨在利用多模态的互补信息对用户生成的视频进行情感理解。现有的方法主要集中在设计复杂的特征融合策略来整合单独提取的多模态表示，忽略了与情感无关的信息的干扰。在本文中，我们提出将单模态表征分解为情感特定特征和情感独立特征，并将前者融合到MSA任务中。具体来说，我们设计了一个新的情感感知解纠缠表示转换框架，称为SDRS，由两个部分组成。交互式情感感知表征解纠缠旨在利用新开发的交叉注意自编码器，通过考虑其他模态的语境影响，提取每个非语言模态的情感特异性特征表征。注意跨模态表征转移试图利用投射后的非语言情感特异性表征在潜在表征空间中转移文本表征。最后利用变换后的表示对预训练的语言模型进行微调，用于多模态情感分析。在CMU-MOSI、CMU-MOSEI和CH-SIMS三个公共基准数据集上进行了大量实验。结果表明，所提出的SDRS框架不仅可以获得仅基于多模态标签的最新结果，而且优于额外需要每个模态标签的方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Affective Computing COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE-COMPUTER SCIENCE, CYBERNETICS

CiteScore

15.00

自引率

6.20%

发文量

174

期刊介绍： The IEEE Transactions on Affective Computing is an international and interdisciplinary journal. Its primary goal is to share research findings on the development of systems capable of recognizing, interpreting, and simulating human emotions and related affective phenomena. The journal publishes original research on the underlying principles and theories that explain how and why affective factors shape human-technology interactions. It also focuses on how techniques for sensing and simulating affect can enhance our understanding of human emotions and processes. Additionally, the journal explores the design, implementation, and evaluation of systems that prioritize the consideration of affect in their usability. We also welcome surveys of existing work that provide new perspectives on the historical and future directions of this field.