{"title":"CIEG-Net: Context Information Enhanced Gated Network for multimodal sentiment analysis","authors":"Zhongyuan Chen, Chong Lu, Yihan Wang","doi":"10.1016/j.patcog.2025.111785","DOIUrl":null,"url":null,"abstract":"<div><div>Multimodal sentiment analysis is a widely studied field aimed at recognizing sentiment information through multiple modalities. The primary challenge in this field lies in developing high-quality fusion frameworks that effectively address the heterogeneity among different modalities and the issue of feature loss during the fusion process. However, existing research has primarily focused on cross-modal fusion, with relatively little attention paid to the sentiment semantics conveyed by context information. In this paper, we propose the Context Information Enhanced Gated Network (CIEG-Net), a novel fusion network that enhances multimodal fusion by incorporating context information from the input modalities. Specifically, we first construct a context information enhanced module to obtain the input and corresponding context information for the text and audio modalities. Then, we designed a fusion network module that facilitates the fusion between the text–audio modality and their respective text-context and audio-context information. Finally, we propose a gated network module that dynamically adjusts the weights of each modality and its context information, further strengthening multimodal fusion and attempting to recover missing features. We evaluate the proposed model on three publicly available multimodal sentiment analysis datasets: CMU-MOSI, CMU-MOSEI, and CH-SIMS. Experimental results show that our model significantly outperforms the current SOTA models.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"168 ","pages":"Article 111785"},"PeriodicalIF":7.6000,"publicationDate":"2025-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pattern Recognition","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0031320325004455","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Multimodal sentiment analysis is a widely studied field aimed at recognizing sentiment information through multiple modalities. The primary challenge in this field lies in developing high-quality fusion frameworks that effectively address the heterogeneity among different modalities and the issue of feature loss during the fusion process. However, existing research has primarily focused on cross-modal fusion, with relatively little attention paid to the sentiment semantics conveyed by context information. In this paper, we propose the Context Information Enhanced Gated Network (CIEG-Net), a novel fusion network that enhances multimodal fusion by incorporating context information from the input modalities. Specifically, we first construct a context information enhanced module to obtain the input and corresponding context information for the text and audio modalities. Then, we designed a fusion network module that facilitates the fusion between the text–audio modality and their respective text-context and audio-context information. Finally, we propose a gated network module that dynamically adjusts the weights of each modality and its context information, further strengthening multimodal fusion and attempting to recover missing features. We evaluate the proposed model on three publicly available multimodal sentiment analysis datasets: CMU-MOSI, CMU-MOSEI, and CH-SIMS. Experimental results show that our model significantly outperforms the current SOTA models.
期刊介绍:
The field of Pattern Recognition is both mature and rapidly evolving, playing a crucial role in various related fields such as computer vision, image processing, text analysis, and neural networks. It closely intersects with machine learning and is being applied in emerging areas like biometrics, bioinformatics, multimedia data analysis, and data science. The journal Pattern Recognition, established half a century ago during the early days of computer science, has since grown significantly in scope and influence.