RCLMuFN: Relational context learning and multiplex fusion network for multimodal sarcasm detection

IF 7.2 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Knowledge-Based Systems Pub Date : 2025-05-02 DOI:10.1016/j.knosys.2025.113614

Tongguan Wang , Junkai Li , Guixin Su , Yongcheng Zhang , Dongyu Su , Yuxue Hu , Ying Sha

{"title":"RCLMuFN: Relational context learning and multiplex fusion network for multimodal sarcasm detection","authors":"Tongguan Wang , Junkai Li , Guixin Su , Yongcheng Zhang , Dongyu Su , Yuxue Hu , Ying Sha","doi":"10.1016/j.knosys.2025.113614","DOIUrl":null,"url":null,"abstract":"<div><div>Sarcasm typically conveys emotions of contempt or criticism by expressing a meaning that is contrary to the speaker’s true intent. Accurately detecting sarcasm aids in identifying and filtering undesirable information on the Internet, thereby mitigating malicious defamation and rumor-mongering. Nonetheless, automatic sarcasm detection remains a challenging task for machines, as it critically depends on intricate factors such as relational context. Existing multimodal sarcasm detection methods focus on introducing graph structures to establish entity relationships between text and image while neglecting to learn the relational context between text and image, which is crucial evidence for understanding the meaning of sarcasm. In addition, the meaning of sarcasm evolves across different contexts, but current methods may struggle to accurately model such dynamic changes, thereby limiting the generalization ability of the models. To address the aforementioned issues, we propose a relational context learning and multiplex fusion network (RCLMuFN) for multimodal sarcasm detection. First, we employ four feature extractors to comprehensively extract features from raw text and images, aiming to excavate potential features that may have been previously overlooked. Second, we propose a relational context learning module to learn the contextual information of text and images and capture the dynamic properties through shallow and deep interactions. Finally, we propose a multiplex feature fusion module to enhance the model’s generalization by effectively integrating multimodal features derived from diverse interaction contexts. Extensive experiments on two multimodal sarcasm detection datasets show that RCLMuFN achieves state-of-the-art performance.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"319 ","pages":"Article 113614"},"PeriodicalIF":7.2000,"publicationDate":"2025-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Knowledge-Based Systems","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0950705125006604","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Sarcasm typically conveys emotions of contempt or criticism by expressing a meaning that is contrary to the speaker’s true intent. Accurately detecting sarcasm aids in identifying and filtering undesirable information on the Internet, thereby mitigating malicious defamation and rumor-mongering. Nonetheless, automatic sarcasm detection remains a challenging task for machines, as it critically depends on intricate factors such as relational context. Existing multimodal sarcasm detection methods focus on introducing graph structures to establish entity relationships between text and image while neglecting to learn the relational context between text and image, which is crucial evidence for understanding the meaning of sarcasm. In addition, the meaning of sarcasm evolves across different contexts, but current methods may struggle to accurately model such dynamic changes, thereby limiting the generalization ability of the models. To address the aforementioned issues, we propose a relational context learning and multiplex fusion network (RCLMuFN) for multimodal sarcasm detection. First, we employ four feature extractors to comprehensively extract features from raw text and images, aiming to excavate potential features that may have been previously overlooked. Second, we propose a relational context learning module to learn the contextual information of text and images and capture the dynamic properties through shallow and deep interactions. Finally, we propose a multiplex feature fusion module to enhance the model’s generalization by effectively integrating multimodal features derived from diverse interaction contexts. Extensive experiments on two multimodal sarcasm detection datasets show that RCLMuFN achieves state-of-the-art performance.

查看原文本刊更多论文

基于关系上下文学习和多重融合网络的多模态讽刺检测

讽刺通常通过表达与说话者真实意图相反的意思来表达蔑视或批评的情绪。准确地发现讽刺有助于识别和过滤互联网上的不良信息，从而减少恶意诽谤和造谣。尽管如此，自动讽刺检测对于机器来说仍然是一项具有挑战性的任务，因为它严重依赖于复杂的因素，如关系上下文。现有的多模态讽刺检测方法侧重于引入图结构来建立文本和图像之间的实体关系，而忽略了对文本和图像之间的关系语境的学习，而这是理解讽刺意义的关键证据。此外，讽刺的含义在不同的语境中会发生变化，但目前的方法可能难以准确地模拟这种动态变化，从而限制了模型的泛化能力。为了解决上述问题，我们提出了一种用于多模态讽刺检测的关系上下文学习和多重融合网络（RCLMuFN）。首先，我们使用四种特征提取器从原始文本和图像中综合提取特征，旨在挖掘之前可能被忽略的潜在特征。其次，我们提出了一个关系上下文学习模块，学习文本和图像的上下文信息，并通过浅层和深层交互捕获动态属性。最后，我们提出了一个多模态特征融合模块，通过有效整合来自不同交互环境的多模态特征来增强模型的泛化能力。在两个多模态讽刺检测数据集上的大量实验表明，RCLMuFN达到了最先进的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Knowledge-Based Systems 工程技术-计算机：人工智能

CiteScore

14.80

自引率

12.50%

发文量

1245

审稿时长

7.8 months

期刊介绍： Knowledge-Based Systems, an international and interdisciplinary journal in artificial intelligence, publishes original, innovative, and creative research results in the field. It focuses on knowledge-based and other artificial intelligence techniques-based systems. The journal aims to support human prediction and decision-making through data science and computation techniques, provide a balanced coverage of theory and practical study, and encourage the development and implementation of knowledge-based intelligence models, methods, systems, and software tools. Applications in business, government, education, engineering, and healthcare are emphasized.