{"title":"Text-Guided Reconstruction Network for Sentiment Analysis With Uncertain Missing Modalities","authors":"Piao Shi;Min Hu;Satoshi Nakagawa;Xiangming Zheng;Xuefeng Shi;Fuji Ren","doi":"10.1109/TAFFC.2025.3541743","DOIUrl":null,"url":null,"abstract":"Multimodal Sentiment Analysis (MSA) is an attractive research that aims to integrate sentiment expressed in textual, visual, and acoustic signals. There are two main problems in the existing methods: 1) the dominant role of the text is underutilization in unaligned multimodal data, and 2) the modality under uncertain missing feature is not sufficiently explored. This paper proposes a Text-guided Reconstruction Network (TgRN) for MSA with uncertain missing modalities in non-aligned sequences. The TgRN network includes three primary modules: Text-guided Extraction Module (TEM), Reconstruction Module (RM) and Text-guided Fusion Module (TFM). First, the TEM consists of the text-guided cross attention units and self-attention units to capture inter-modal features and intra-modal features, respectively. Second, leveraging enhanced attention units and a three-way squeeze-and-excitation block, the RM is designed to learn semantic information from incomplete data and reconstruct missing modality features. Third, the TFM utilizes a progressive modality-mixing adaptation gate to explore the dynamic correlations between nonverbal and verbal modalities, effectively addressing the modality gap issue. Finally, under the supervision of sentiment prediction loss and reconstruction loss, the TgRN effectively processes both uncertain missing-modality conditions and ideal complete modality conditions. Extensive experiments on CMU-MOSI and CH-SIMS demonstrate that our proposed method outperforms state-of-the-art approaches.","PeriodicalId":13131,"journal":{"name":"IEEE Transactions on Affective Computing","volume":"16 3","pages":"1825-1838"},"PeriodicalIF":9.8000,"publicationDate":"2025-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Affective Computing","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10884915/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Multimodal Sentiment Analysis (MSA) is an attractive research that aims to integrate sentiment expressed in textual, visual, and acoustic signals. There are two main problems in the existing methods: 1) the dominant role of the text is underutilization in unaligned multimodal data, and 2) the modality under uncertain missing feature is not sufficiently explored. This paper proposes a Text-guided Reconstruction Network (TgRN) for MSA with uncertain missing modalities in non-aligned sequences. The TgRN network includes three primary modules: Text-guided Extraction Module (TEM), Reconstruction Module (RM) and Text-guided Fusion Module (TFM). First, the TEM consists of the text-guided cross attention units and self-attention units to capture inter-modal features and intra-modal features, respectively. Second, leveraging enhanced attention units and a three-way squeeze-and-excitation block, the RM is designed to learn semantic information from incomplete data and reconstruct missing modality features. Third, the TFM utilizes a progressive modality-mixing adaptation gate to explore the dynamic correlations between nonverbal and verbal modalities, effectively addressing the modality gap issue. Finally, under the supervision of sentiment prediction loss and reconstruction loss, the TgRN effectively processes both uncertain missing-modality conditions and ideal complete modality conditions. Extensive experiments on CMU-MOSI and CH-SIMS demonstrate that our proposed method outperforms state-of-the-art approaches.
期刊介绍:
The IEEE Transactions on Affective Computing is an international and interdisciplinary journal. Its primary goal is to share research findings on the development of systems capable of recognizing, interpreting, and simulating human emotions and related affective phenomena. The journal publishes original research on the underlying principles and theories that explain how and why affective factors shape human-technology interactions. It also focuses on how techniques for sensing and simulating affect can enhance our understanding of human emotions and processes. Additionally, the journal explores the design, implementation, and evaluation of systems that prioritize the consideration of affect in their usability. We also welcome surveys of existing work that provide new perspectives on the historical and future directions of this field.