Single-Stage Extensive Semantic Fusion for multi-modal sarcasm detection

IF 2.3 Q2 COMPUTER SCIENCE, THEORY & METHODS
Array Pub Date : 2024-04-17 DOI:10.1016/j.array.2024.100344
Hong Fang , Dahao Liang , Weiyu Xiang
{"title":"Single-Stage Extensive Semantic Fusion for multi-modal sarcasm detection","authors":"Hong Fang ,&nbsp;Dahao Liang ,&nbsp;Weiyu Xiang","doi":"10.1016/j.array.2024.100344","DOIUrl":null,"url":null,"abstract":"<div><p>With the rise of social media and online interactions, there is a growing need for analytical models capable of understanding the nuanced, multi-modal communication inherent in platforms, especially for detecting sarcasm. Existing research employs multi-stage models along with extensive semantic information extractions and single-modal encoders. These models often struggle with efficient aligning and fusing multi-modal representations. Addressing these shortcomings, we introduce the Single-Stage Extensive Semantic Fusion (SSESF) model, designed to concurrently process multi-modal inputs in a unified framework, which performs encoding and fusing in the same architecture with shared parameters. A projection mechanism is employed to overcome the challenges posed by the diversity of inputs and the integration of a wide range of semantic information. Additionally, we design a multi-objective optimization that enhances the model’s ability to learn latent semantic nuances with supervised contrastive learning. The unified framework emphasizes the interaction and integration of multi-modal data, while multi-objective optimization preserves the complexity of semantic nuances for sarcasm detection. Experimental results on a public multi-modal sarcasm dataset demonstrate the superiority of our model, achieving state-of-the-art performance. The findings highlight the model’s capability to integrate extensive semantic information, demonstrating its effectiveness in the simultaneous interpretation and fusion of multi-modal data for sarcasm detection.</p></div>","PeriodicalId":8417,"journal":{"name":"Array","volume":"22 ","pages":"Article 100344"},"PeriodicalIF":2.3000,"publicationDate":"2024-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2590005624000109/pdfft?md5=5136c2ac1ad918984ba24754918dce68&pid=1-s2.0-S2590005624000109-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Array","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2590005624000109","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}
引用次数: 0

Abstract

With the rise of social media and online interactions, there is a growing need for analytical models capable of understanding the nuanced, multi-modal communication inherent in platforms, especially for detecting sarcasm. Existing research employs multi-stage models along with extensive semantic information extractions and single-modal encoders. These models often struggle with efficient aligning and fusing multi-modal representations. Addressing these shortcomings, we introduce the Single-Stage Extensive Semantic Fusion (SSESF) model, designed to concurrently process multi-modal inputs in a unified framework, which performs encoding and fusing in the same architecture with shared parameters. A projection mechanism is employed to overcome the challenges posed by the diversity of inputs and the integration of a wide range of semantic information. Additionally, we design a multi-objective optimization that enhances the model’s ability to learn latent semantic nuances with supervised contrastive learning. The unified framework emphasizes the interaction and integration of multi-modal data, while multi-objective optimization preserves the complexity of semantic nuances for sarcasm detection. Experimental results on a public multi-modal sarcasm dataset demonstrate the superiority of our model, achieving state-of-the-art performance. The findings highlight the model’s capability to integrate extensive semantic information, demonstrating its effectiveness in the simultaneous interpretation and fusion of multi-modal data for sarcasm detection.

用于多模态讽刺检测的单级广泛语义融合
随着社交媒体和在线互动的兴起,人们越来越需要能够理解平台固有的细微多模态交流的分析模型,尤其是用于检测讽刺的分析模型。现有的研究采用了多阶段模型、广泛的语义信息提取和单模态编码器。这些模型往往难以有效对齐和融合多模态表征。为了解决这些不足,我们引入了单阶段广泛语义融合(SSESF)模型,该模型旨在一个统一的框架中同时处理多模态输入,在同一架构中执行编码和融合,并共享参数。我们采用了一种投影机制,以克服输入的多样性和各种语义信息的融合所带来的挑战。此外,我们还设计了一种多目标优化方法,通过监督对比学习来增强模型学习潜在语义细微差别的能力。统一框架强调多模态数据的交互和整合,而多目标优化则保留了语义细微差别在讽刺检测中的复杂性。在一个公开的多模态讽刺数据集上的实验结果证明了我们模型的优越性,达到了最先进的性能。实验结果凸显了该模型整合大量语义信息的能力,证明了它在同时解释和融合多模态数据进行讽刺语言检测方面的有效性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Array
Array Computer Science-General Computer Science
CiteScore
4.40
自引率
0.00%
发文量
93
审稿时长
45 days
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信