Single-Stage Extensive Semantic Fusion for multi-modal sarcasm detection

IF 2.3 Q2 COMPUTER SCIENCE, THEORY & METHODS

Array Pub Date : 2024-04-17 DOI:10.1016/j.array.2024.100344

Hong Fang , Dahao Liang , Weiyu Xiang

{"title":"Single-Stage Extensive Semantic Fusion for multi-modal sarcasm detection","authors":"Hong Fang , Dahao Liang , Weiyu Xiang","doi":"10.1016/j.array.2024.100344","DOIUrl":null,"url":null,"abstract":"<div><p>With the rise of social media and online interactions, there is a growing need for analytical models capable of understanding the nuanced, multi-modal communication inherent in platforms, especially for detecting sarcasm. Existing research employs multi-stage models along with extensive semantic information extractions and single-modal encoders. These models often struggle with efficient aligning and fusing multi-modal representations. Addressing these shortcomings, we introduce the Single-Stage Extensive Semantic Fusion (SSESF) model, designed to concurrently process multi-modal inputs in a unified framework, which performs encoding and fusing in the same architecture with shared parameters. A projection mechanism is employed to overcome the challenges posed by the diversity of inputs and the integration of a wide range of semantic information. Additionally, we design a multi-objective optimization that enhances the model’s ability to learn latent semantic nuances with supervised contrastive learning. The unified framework emphasizes the interaction and integration of multi-modal data, while multi-objective optimization preserves the complexity of semantic nuances for sarcasm detection. Experimental results on a public multi-modal sarcasm dataset demonstrate the superiority of our model, achieving state-of-the-art performance. The findings highlight the model’s capability to integrate extensive semantic information, demonstrating its effectiveness in the simultaneous interpretation and fusion of multi-modal data for sarcasm detection.</p></div>","PeriodicalId":8417,"journal":{"name":"Array","volume":"22 ","pages":"Article 100344"},"PeriodicalIF":2.3000,"publicationDate":"2024-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2590005624000109/pdfft?md5=5136c2ac1ad918984ba24754918dce68&pid=1-s2.0-S2590005624000109-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Array","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2590005624000109","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}

引用次数: 0

Abstract

With the rise of social media and online interactions, there is a growing need for analytical models capable of understanding the nuanced, multi-modal communication inherent in platforms, especially for detecting sarcasm. Existing research employs multi-stage models along with extensive semantic information extractions and single-modal encoders. These models often struggle with efficient aligning and fusing multi-modal representations. Addressing these shortcomings, we introduce the Single-Stage Extensive Semantic Fusion (SSESF) model, designed to concurrently process multi-modal inputs in a unified framework, which performs encoding and fusing in the same architecture with shared parameters. A projection mechanism is employed to overcome the challenges posed by the diversity of inputs and the integration of a wide range of semantic information. Additionally, we design a multi-objective optimization that enhances the model’s ability to learn latent semantic nuances with supervised contrastive learning. The unified framework emphasizes the interaction and integration of multi-modal data, while multi-objective optimization preserves the complexity of semantic nuances for sarcasm detection. Experimental results on a public multi-modal sarcasm dataset demonstrate the superiority of our model, achieving state-of-the-art performance. The findings highlight the model’s capability to integrate extensive semantic information, demonstrating its effectiveness in the simultaneous interpretation and fusion of multi-modal data for sarcasm detection.

查看原文本刊更多论文

用于多模态讽刺检测的单级广泛语义融合

随着社交媒体和在线互动的兴起，人们越来越需要能够理解平台固有的细微多模态交流的分析模型，尤其是用于检测讽刺的分析模型。现有的研究采用了多阶段模型、广泛的语义信息提取和单模态编码器。这些模型往往难以有效对齐和融合多模态表征。为了解决这些不足，我们引入了单阶段广泛语义融合（SSESF）模型，该模型旨在一个统一的框架中同时处理多模态输入，在同一架构中执行编码和融合，并共享参数。我们采用了一种投影机制，以克服输入的多样性和各种语义信息的融合所带来的挑战。此外，我们还设计了一种多目标优化方法，通过监督对比学习来增强模型学习潜在语义细微差别的能力。统一框架强调多模态数据的交互和整合，而多目标优化则保留了语义细微差别在讽刺检测中的复杂性。在一个公开的多模态讽刺数据集上的实验结果证明了我们模型的优越性，达到了最先进的性能。实验结果凸显了该模型整合大量语义信息的能力，证明了它在同时解释和融合多模态数据进行讽刺语言检测方面的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊