基于上下文学习的隐式语义多模态内容检测与解释

IF 11.1 1区 工程技术 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC
Xiuxian Wang;Lanjun Wang;Yuting Su;Hongshuo Tian;Guoqing Jin;An-An Liu
{"title":"基于上下文学习的隐式语义多模态内容检测与解释","authors":"Xiuxian Wang;Lanjun Wang;Yuting Su;Hongshuo Tian;Guoqing Jin;An-An Liu","doi":"10.1109/TCSVT.2025.3550900","DOIUrl":null,"url":null,"abstract":"In recent years, the field of explicit semantic multimodal content research makes significant progress. However, research on content with implicit semantics, such as online memes, remains insufficient. Memes often convey implicit semantics through metaphors and may sometimes contain hateful information. To address this issue, researchers propose a task for detecting hateful memes, opening up new avenues for exploring implicit semantics. The hateful meme detection currently faces two main problems: 1) the rapid emergence of meme content makes continuous tracking and detection difficult; 2) current methods often lack interpretability, which limits the understanding and trust in the detection results. To make a better understanding of memes, we analyze the definition of metaphor from social science and identify the three key factors of metaphor: socio-cultural knowledge, metaphorical tenor, and metaphorical representation pattern. According to these key factors, we guide a multimodal large language model (MLLM) to infer the metaphors expressed in memes step by step. Particularly, we propose a hateful meme detection and interpretation framework, which has four modules. We first leverage a multimodal generative search method to obtain socio-cultural knowledge relevant to visual objects of memes. Then, we use socio-cultural knowledge to instruct the MLLM to assess the social-cultural relevance scores between visual objects and textual information, and identify the metaphorical tenor of memes. Meanwhile, we apply a representative interpretation method to provide representative cases of memes and analyze these cases to explore metaphorical representation pattern. Finally, a chain-of-thought prompt is constructed to integrate the output of the above modules, guiding the MLLM to accurately detect and interpret hateful memes. Our method achieves state-of-the-art performance on three hateful meme detection benchmarks and performs better than supervised training models on the hateful meme interpretation benchmark.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 9","pages":"9545-9558"},"PeriodicalIF":11.1000,"publicationDate":"2025-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Few-Shot In-Context Learning for Implicit Semantic Multimodal Content Detection and Interpretation\",\"authors\":\"Xiuxian Wang;Lanjun Wang;Yuting Su;Hongshuo Tian;Guoqing Jin;An-An Liu\",\"doi\":\"10.1109/TCSVT.2025.3550900\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In recent years, the field of explicit semantic multimodal content research makes significant progress. However, research on content with implicit semantics, such as online memes, remains insufficient. Memes often convey implicit semantics through metaphors and may sometimes contain hateful information. To address this issue, researchers propose a task for detecting hateful memes, opening up new avenues for exploring implicit semantics. The hateful meme detection currently faces two main problems: 1) the rapid emergence of meme content makes continuous tracking and detection difficult; 2) current methods often lack interpretability, which limits the understanding and trust in the detection results. To make a better understanding of memes, we analyze the definition of metaphor from social science and identify the three key factors of metaphor: socio-cultural knowledge, metaphorical tenor, and metaphorical representation pattern. According to these key factors, we guide a multimodal large language model (MLLM) to infer the metaphors expressed in memes step by step. Particularly, we propose a hateful meme detection and interpretation framework, which has four modules. We first leverage a multimodal generative search method to obtain socio-cultural knowledge relevant to visual objects of memes. Then, we use socio-cultural knowledge to instruct the MLLM to assess the social-cultural relevance scores between visual objects and textual information, and identify the metaphorical tenor of memes. Meanwhile, we apply a representative interpretation method to provide representative cases of memes and analyze these cases to explore metaphorical representation pattern. Finally, a chain-of-thought prompt is constructed to integrate the output of the above modules, guiding the MLLM to accurately detect and interpret hateful memes. Our method achieves state-of-the-art performance on three hateful meme detection benchmarks and performs better than supervised training models on the hateful meme interpretation benchmark.\",\"PeriodicalId\":13082,\"journal\":{\"name\":\"IEEE Transactions on Circuits and Systems for Video Technology\",\"volume\":\"35 9\",\"pages\":\"9545-9558\"},\"PeriodicalIF\":11.1000,\"publicationDate\":\"2025-03-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Circuits and Systems for Video Technology\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10925399/\",\"RegionNum\":1,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Circuits and Systems for Video Technology","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10925399/","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0

摘要

近年来,显性语义多模态内容研究领域取得了重大进展。然而,对网络模因等具有隐含语义的内容的研究仍然不足。模因通常通过隐喻传达隐含的语义,有时可能包含仇恨信息。为了解决这个问题,研究人员提出了一个检测仇恨模因的任务,为探索隐含语义开辟了新的途径。仇恨模因检测目前面临两个主要问题:1)模因内容的快速涌现给持续跟踪和检测带来困难;2)现有方法往往缺乏可解释性,这限制了对检测结果的理解和信任。为了更好地理解模因,我们从社会科学的角度分析了隐喻的定义,并确定了隐喻的三个关键因素:社会文化知识、隐喻意旨和隐喻表征模式。根据这些关键因素,我们引导多模态大语言模型(multimodal large language model, MLLM)逐步推断模因所表达的隐喻。特别地,我们提出了一个仇恨模因检测和解释框架,它有四个模块。我们首先利用多模态生成搜索方法来获取与模因视觉对象相关的社会文化知识。然后,我们利用社会文化知识指导MLLM评估视觉对象与文本信息之间的社会文化关联得分,并识别模因的隐喻意旨。同时,我们运用代表性解释方法提供模因的代表性案例,并对这些案例进行分析,探索模因的隐喻表征模式。最后,构建思维链提示,整合以上模块的输出,指导MLLM准确检测和解读仇恨模因。我们的方法在三个仇恨模因检测基准上达到了最先进的性能,并且在仇恨模因解释基准上比监督训练模型表现得更好。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Few-Shot In-Context Learning for Implicit Semantic Multimodal Content Detection and Interpretation
In recent years, the field of explicit semantic multimodal content research makes significant progress. However, research on content with implicit semantics, such as online memes, remains insufficient. Memes often convey implicit semantics through metaphors and may sometimes contain hateful information. To address this issue, researchers propose a task for detecting hateful memes, opening up new avenues for exploring implicit semantics. The hateful meme detection currently faces two main problems: 1) the rapid emergence of meme content makes continuous tracking and detection difficult; 2) current methods often lack interpretability, which limits the understanding and trust in the detection results. To make a better understanding of memes, we analyze the definition of metaphor from social science and identify the three key factors of metaphor: socio-cultural knowledge, metaphorical tenor, and metaphorical representation pattern. According to these key factors, we guide a multimodal large language model (MLLM) to infer the metaphors expressed in memes step by step. Particularly, we propose a hateful meme detection and interpretation framework, which has four modules. We first leverage a multimodal generative search method to obtain socio-cultural knowledge relevant to visual objects of memes. Then, we use socio-cultural knowledge to instruct the MLLM to assess the social-cultural relevance scores between visual objects and textual information, and identify the metaphorical tenor of memes. Meanwhile, we apply a representative interpretation method to provide representative cases of memes and analyze these cases to explore metaphorical representation pattern. Finally, a chain-of-thought prompt is constructed to integrate the output of the above modules, guiding the MLLM to accurately detect and interpret hateful memes. Our method achieves state-of-the-art performance on three hateful meme detection benchmarks and performs better than supervised training models on the hateful meme interpretation benchmark.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
13.80
自引率
27.40%
发文量
660
审稿时长
5 months
期刊介绍: The IEEE Transactions on Circuits and Systems for Video Technology (TCSVT) is dedicated to covering all aspects of video technologies from a circuits and systems perspective. We encourage submissions of general, theoretical, and application-oriented papers related to image and video acquisition, representation, presentation, and display. Additionally, we welcome contributions in areas such as processing, filtering, and transforms; analysis and synthesis; learning and understanding; compression, transmission, communication, and networking; as well as storage, retrieval, indexing, and search. Furthermore, papers focusing on hardware and software design and implementation are highly valued. Join us in advancing the field of video technology through innovative research and insights.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信