Tongguan Wang , Junkai Li , Guixin Su , Yongcheng Zhang , Dongyu Su , Yuxue Hu , Ying Sha
{"title":"RCLMuFN: Relational context learning and multiplex fusion network for multimodal sarcasm detection","authors":"Tongguan Wang , Junkai Li , Guixin Su , Yongcheng Zhang , Dongyu Su , Yuxue Hu , Ying Sha","doi":"10.1016/j.knosys.2025.113614","DOIUrl":null,"url":null,"abstract":"<div><div>Sarcasm typically conveys emotions of contempt or criticism by expressing a meaning that is contrary to the speaker’s true intent. Accurately detecting sarcasm aids in identifying and filtering undesirable information on the Internet, thereby mitigating malicious defamation and rumor-mongering. Nonetheless, automatic sarcasm detection remains a challenging task for machines, as it critically depends on intricate factors such as relational context. Existing multimodal sarcasm detection methods focus on introducing graph structures to establish entity relationships between text and image while neglecting to learn the relational context between text and image, which is crucial evidence for understanding the meaning of sarcasm. In addition, the meaning of sarcasm evolves across different contexts, but current methods may struggle to accurately model such dynamic changes, thereby limiting the generalization ability of the models. To address the aforementioned issues, we propose a relational context learning and multiplex fusion network (RCLMuFN) for multimodal sarcasm detection. First, we employ four feature extractors to comprehensively extract features from raw text and images, aiming to excavate potential features that may have been previously overlooked. Second, we propose a relational context learning module to learn the contextual information of text and images and capture the dynamic properties through shallow and deep interactions. Finally, we propose a multiplex feature fusion module to enhance the model’s generalization by effectively integrating multimodal features derived from diverse interaction contexts. Extensive experiments on two multimodal sarcasm detection datasets show that RCLMuFN achieves state-of-the-art performance.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"319 ","pages":"Article 113614"},"PeriodicalIF":7.2000,"publicationDate":"2025-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Knowledge-Based Systems","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0950705125006604","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Sarcasm typically conveys emotions of contempt or criticism by expressing a meaning that is contrary to the speaker’s true intent. Accurately detecting sarcasm aids in identifying and filtering undesirable information on the Internet, thereby mitigating malicious defamation and rumor-mongering. Nonetheless, automatic sarcasm detection remains a challenging task for machines, as it critically depends on intricate factors such as relational context. Existing multimodal sarcasm detection methods focus on introducing graph structures to establish entity relationships between text and image while neglecting to learn the relational context between text and image, which is crucial evidence for understanding the meaning of sarcasm. In addition, the meaning of sarcasm evolves across different contexts, but current methods may struggle to accurately model such dynamic changes, thereby limiting the generalization ability of the models. To address the aforementioned issues, we propose a relational context learning and multiplex fusion network (RCLMuFN) for multimodal sarcasm detection. First, we employ four feature extractors to comprehensively extract features from raw text and images, aiming to excavate potential features that may have been previously overlooked. Second, we propose a relational context learning module to learn the contextual information of text and images and capture the dynamic properties through shallow and deep interactions. Finally, we propose a multiplex feature fusion module to enhance the model’s generalization by effectively integrating multimodal features derived from diverse interaction contexts. Extensive experiments on two multimodal sarcasm detection datasets show that RCLMuFN achieves state-of-the-art performance.
期刊介绍:
Knowledge-Based Systems, an international and interdisciplinary journal in artificial intelligence, publishes original, innovative, and creative research results in the field. It focuses on knowledge-based and other artificial intelligence techniques-based systems. The journal aims to support human prediction and decision-making through data science and computation techniques, provide a balanced coverage of theory and practical study, and encourage the development and implementation of knowledge-based intelligence models, methods, systems, and software tools. Applications in business, government, education, engineering, and healthcare are emphasized.