Jiaqi Liu , Yong Wang , Jing Yang , Fanshu Shang , Fan He
{"title":"Prompt-matching synthesis model for missing modalities in sentiment analysis","authors":"Jiaqi Liu , Yong Wang , Jing Yang , Fanshu Shang , Fan He","doi":"10.1016/j.knosys.2025.113519","DOIUrl":null,"url":null,"abstract":"<div><div>In multimodal sentiment analysis, commentary videos often lack certain sentences or frames, leaving gaps that may contain crucial sentiment cues. Current methods primarily focus on modal fusion, overlooking the uncertainty of missing modalities, which results in underutilized data and less complete and less accurate sentiment analysis. To address these challenges, we propose a prompt-matching synthesis model to handle missing modalities in sentiment analysis. First, we develop unimodal encoders using prompt learning to enhance the model’s understanding of inter-modal relationships during feature extraction. Learnable prompts are introduced before textual modalities, while cross-modal prompts are applied to acoustic and visual modalities. Second, we implement bidirectional cross-modal matching to minimize discrepancies among shared features, employing central moment discrepancy loss across multiple modalities. A comparator is designed to infer features based on the absence of one or two modalities, allowing for the synthesis of missing modality features from available data. Finally, the synthesized modal features are integrated with the initial features, optimizing the fusion loss and central moment discrepancy loss to enhance sentiment analysis accuracy. Experimental results demonstrate that our method achieves strong performance on multiple datasets for multimodal sentiment analysis, even with uncertain missing modalities.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"318 ","pages":"Article 113519"},"PeriodicalIF":7.2000,"publicationDate":"2025-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Knowledge-Based Systems","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0950705125005659","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
In multimodal sentiment analysis, commentary videos often lack certain sentences or frames, leaving gaps that may contain crucial sentiment cues. Current methods primarily focus on modal fusion, overlooking the uncertainty of missing modalities, which results in underutilized data and less complete and less accurate sentiment analysis. To address these challenges, we propose a prompt-matching synthesis model to handle missing modalities in sentiment analysis. First, we develop unimodal encoders using prompt learning to enhance the model’s understanding of inter-modal relationships during feature extraction. Learnable prompts are introduced before textual modalities, while cross-modal prompts are applied to acoustic and visual modalities. Second, we implement bidirectional cross-modal matching to minimize discrepancies among shared features, employing central moment discrepancy loss across multiple modalities. A comparator is designed to infer features based on the absence of one or two modalities, allowing for the synthesis of missing modality features from available data. Finally, the synthesized modal features are integrated with the initial features, optimizing the fusion loss and central moment discrepancy loss to enhance sentiment analysis accuracy. Experimental results demonstrate that our method achieves strong performance on multiple datasets for multimodal sentiment analysis, even with uncertain missing modalities.
期刊介绍:
Knowledge-Based Systems, an international and interdisciplinary journal in artificial intelligence, publishes original, innovative, and creative research results in the field. It focuses on knowledge-based and other artificial intelligence techniques-based systems. The journal aims to support human prediction and decision-making through data science and computation techniques, provide a balanced coverage of theory and practical study, and encourage the development and implementation of knowledge-based intelligence models, methods, systems, and software tools. Applications in business, government, education, engineering, and healthcare are emphasized.