Text-Guided Reconstruction Network for Sentiment Analysis With Uncertain Missing Modalities

IF 9.8 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE Transactions on Affective Computing Pub Date : 2025-02-13 DOI:10.1109/TAFFC.2025.3541743

Piao Shi;Min Hu;Satoshi Nakagawa;Xiangming Zheng;Xuefeng Shi;Fuji Ren

{"title":"Text-Guided Reconstruction Network for Sentiment Analysis With Uncertain Missing Modalities","authors":"Piao Shi;Min Hu;Satoshi Nakagawa;Xiangming Zheng;Xuefeng Shi;Fuji Ren","doi":"10.1109/TAFFC.2025.3541743","DOIUrl":null,"url":null,"abstract":"Multimodal Sentiment Analysis (MSA) is an attractive research that aims to integrate sentiment expressed in textual, visual, and acoustic signals. There are two main problems in the existing methods: 1) the dominant role of the text is underutilization in unaligned multimodal data, and 2) the modality under uncertain missing feature is not sufficiently explored. This paper proposes a Text-guided Reconstruction Network (TgRN) for MSA with uncertain missing modalities in non-aligned sequences. The TgRN network includes three primary modules: Text-guided Extraction Module (TEM), Reconstruction Module (RM) and Text-guided Fusion Module (TFM). First, the TEM consists of the text-guided cross attention units and self-attention units to capture inter-modal features and intra-modal features, respectively. Second, leveraging enhanced attention units and a three-way squeeze-and-excitation block, the RM is designed to learn semantic information from incomplete data and reconstruct missing modality features. Third, the TFM utilizes a progressive modality-mixing adaptation gate to explore the dynamic correlations between nonverbal and verbal modalities, effectively addressing the modality gap issue. Finally, under the supervision of sentiment prediction loss and reconstruction loss, the TgRN effectively processes both uncertain missing-modality conditions and ideal complete modality conditions. Extensive experiments on CMU-MOSI and CH-SIMS demonstrate that our proposed method outperforms state-of-the-art approaches.","PeriodicalId":13131,"journal":{"name":"IEEE Transactions on Affective Computing","volume":"16 3","pages":"1825-1838"},"PeriodicalIF":9.8000,"publicationDate":"2025-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Affective Computing","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10884915/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Multimodal Sentiment Analysis (MSA) is an attractive research that aims to integrate sentiment expressed in textual, visual, and acoustic signals. There are two main problems in the existing methods: 1) the dominant role of the text is underutilization in unaligned multimodal data, and 2) the modality under uncertain missing feature is not sufficiently explored. This paper proposes a Text-guided Reconstruction Network (TgRN) for MSA with uncertain missing modalities in non-aligned sequences. The TgRN network includes three primary modules: Text-guided Extraction Module (TEM), Reconstruction Module (RM) and Text-guided Fusion Module (TFM). First, the TEM consists of the text-guided cross attention units and self-attention units to capture inter-modal features and intra-modal features, respectively. Second, leveraging enhanced attention units and a three-way squeeze-and-excitation block, the RM is designed to learn semantic information from incomplete data and reconstruct missing modality features. Third, the TFM utilizes a progressive modality-mixing adaptation gate to explore the dynamic correlations between nonverbal and verbal modalities, effectively addressing the modality gap issue. Finally, under the supervision of sentiment prediction loss and reconstruction loss, the TgRN effectively processes both uncertain missing-modality conditions and ideal complete modality conditions. Extensive experiments on CMU-MOSI and CH-SIMS demonstrate that our proposed method outperforms state-of-the-art approaches.

查看原文本刊更多论文

文本导向的不确定缺失模态情感分析重建网络

多模态情感分析（MSA）是一项有吸引力的研究，旨在整合文本、视觉和声音信号中表达的情感。现有方法存在两个主要问题：1)文本在未对齐多模态数据中的主导作用未得到充分利用；2)不确定缺失特征下的模态未得到充分挖掘。本文提出了一种文本引导重建网络（TgRN），用于非对齐序列中不确定缺失模态的MSA。TgRN网络包括三个主要模块：文本引导提取模块（TEM）、重建模块（RM）和文本引导融合模块（TFM）。首先，TEM由文本引导的交叉注意单元和自注意单元组成，分别捕捉模态间特征和模态内特征。其次，利用增强的注意单元和三向挤压和激励块，RM被设计用于从不完整数据中学习语义信息并重建缺失的模态特征。第三，TFM利用渐进式模态混合适应门来探索非语言和语言模态之间的动态相关性，有效地解决了模态差距问题。最后，在情感预测损失和重建损失的监督下，TgRN有效地处理了不确定缺失情态条件和理想完整情态条件。在CMU-MOSI和CH-SIMS上的大量实验表明，我们提出的方法优于最先进的方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Affective Computing COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE-COMPUTER SCIENCE, CYBERNETICS

CiteScore

15.00

自引率

6.20%

发文量

174

期刊介绍： The IEEE Transactions on Affective Computing is an international and interdisciplinary journal. Its primary goal is to share research findings on the development of systems capable of recognizing, interpreting, and simulating human emotions and related affective phenomena. The journal publishes original research on the underlying principles and theories that explain how and why affective factors shape human-technology interactions. It also focuses on how techniques for sensing and simulating affect can enhance our understanding of human emotions and processes. Additionally, the journal explores the design, implementation, and evaluation of systems that prioritize the consideration of affect in their usability. We also welcome surveys of existing work that provide new perspectives on the historical and future directions of this field.