{"title":"多模态情感分析的情感感知解纠缠表征转移","authors":"Sicheng Zhao;Zhenhua Yang;Henglin Shi;Xiaocheng Feng;Lingpengkun Meng;Bing Qin;Chenggang Yan;Jianhua Tao;Guiguang Ding","doi":"10.1109/TAFFC.2025.3539225","DOIUrl":null,"url":null,"abstract":"Multimodal sentiment analysis (MSA) aims to leverage the complementary information from multiple modalities for affective understanding of user-generated videos. Existing methods mainly focused on designing sophisticated feature fusion strategies to integrate the separately extracted multimodal representations, ignoring the interference of the information irrelevant to sentiment. In this paper, we propose to disentangle the unimodal representations into sentiment-specific and sentiment-independent features, the former of which are fused for the MSA task. Specifically, we design a novel Sentiment-aware Disentangled Representation Shifting framework, termed SDRS, with two components. <bold>Interactive sentiment-aware representation disentanglement</b> aims to extract sentiment-specific feature representations for each nonverbal modality by considering the contextual influence of other modalities with the newly developed cross-attention autoencoder. <bold>Attentive cross-modal representation shifting</b> tries to shift the textual representation in a latent token space using the nonverbal sentiment-specific representations after projection. The shifted representation is finally employed to fine-tune a pre-trained language model for multimodal sentiment analysis. Extensive experiments are conducted on three public benchmark datasets, i.e., CMU-MOSI, CMU-MOSEI, and CH-SIMS. The results demonstrate that the proposed SDRS framework not only obtains state-of-the-art results based solely on multimodal labels but also outperforms the methods that additionally require the labels of each modality.","PeriodicalId":13131,"journal":{"name":"IEEE Transactions on Affective Computing","volume":"16 3","pages":"1802-1813"},"PeriodicalIF":9.8000,"publicationDate":"2025-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"SDRS: Sentiment-Aware Disentangled Representation Shifting for Multimodal Sentiment Analysis\",\"authors\":\"Sicheng Zhao;Zhenhua Yang;Henglin Shi;Xiaocheng Feng;Lingpengkun Meng;Bing Qin;Chenggang Yan;Jianhua Tao;Guiguang Ding\",\"doi\":\"10.1109/TAFFC.2025.3539225\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Multimodal sentiment analysis (MSA) aims to leverage the complementary information from multiple modalities for affective understanding of user-generated videos. Existing methods mainly focused on designing sophisticated feature fusion strategies to integrate the separately extracted multimodal representations, ignoring the interference of the information irrelevant to sentiment. In this paper, we propose to disentangle the unimodal representations into sentiment-specific and sentiment-independent features, the former of which are fused for the MSA task. Specifically, we design a novel Sentiment-aware Disentangled Representation Shifting framework, termed SDRS, with two components. <bold>Interactive sentiment-aware representation disentanglement</b> aims to extract sentiment-specific feature representations for each nonverbal modality by considering the contextual influence of other modalities with the newly developed cross-attention autoencoder. <bold>Attentive cross-modal representation shifting</b> tries to shift the textual representation in a latent token space using the nonverbal sentiment-specific representations after projection. The shifted representation is finally employed to fine-tune a pre-trained language model for multimodal sentiment analysis. Extensive experiments are conducted on three public benchmark datasets, i.e., CMU-MOSI, CMU-MOSEI, and CH-SIMS. The results demonstrate that the proposed SDRS framework not only obtains state-of-the-art results based solely on multimodal labels but also outperforms the methods that additionally require the labels of each modality.\",\"PeriodicalId\":13131,\"journal\":{\"name\":\"IEEE Transactions on Affective Computing\",\"volume\":\"16 3\",\"pages\":\"1802-1813\"},\"PeriodicalIF\":9.8000,\"publicationDate\":\"2025-02-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Affective Computing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10876597/\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Affective Computing","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10876597/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
SDRS: Sentiment-Aware Disentangled Representation Shifting for Multimodal Sentiment Analysis
Multimodal sentiment analysis (MSA) aims to leverage the complementary information from multiple modalities for affective understanding of user-generated videos. Existing methods mainly focused on designing sophisticated feature fusion strategies to integrate the separately extracted multimodal representations, ignoring the interference of the information irrelevant to sentiment. In this paper, we propose to disentangle the unimodal representations into sentiment-specific and sentiment-independent features, the former of which are fused for the MSA task. Specifically, we design a novel Sentiment-aware Disentangled Representation Shifting framework, termed SDRS, with two components. Interactive sentiment-aware representation disentanglement aims to extract sentiment-specific feature representations for each nonverbal modality by considering the contextual influence of other modalities with the newly developed cross-attention autoencoder. Attentive cross-modal representation shifting tries to shift the textual representation in a latent token space using the nonverbal sentiment-specific representations after projection. The shifted representation is finally employed to fine-tune a pre-trained language model for multimodal sentiment analysis. Extensive experiments are conducted on three public benchmark datasets, i.e., CMU-MOSI, CMU-MOSEI, and CH-SIMS. The results demonstrate that the proposed SDRS framework not only obtains state-of-the-art results based solely on multimodal labels but also outperforms the methods that additionally require the labels of each modality.
期刊介绍:
The IEEE Transactions on Affective Computing is an international and interdisciplinary journal. Its primary goal is to share research findings on the development of systems capable of recognizing, interpreting, and simulating human emotions and related affective phenomena. The journal publishes original research on the underlying principles and theories that explain how and why affective factors shape human-technology interactions. It also focuses on how techniques for sensing and simulating affect can enhance our understanding of human emotions and processes. Additionally, the journal explores the design, implementation, and evaluation of systems that prioritize the consideration of affect in their usability. We also welcome surveys of existing work that provide new perspectives on the historical and future directions of this field.