基于多源语义图的多模态讽刺解释生成

Annual Meeting of the Association for Computational Linguistics Pub Date : 2023-06-29 DOI:10.48550/arXiv.2306.16650

Liqiang Jing, Xuemeng Song, Kun Ouyang, Mengzhao Jia, Liqiang Nie

{"title":"基于多源语义图的多模态讽刺解释生成","authors":"Liqiang Jing, Xuemeng Song, Kun Ouyang, Mengzhao Jia, Liqiang Nie","doi":"10.48550/arXiv.2306.16650","DOIUrl":null,"url":null,"abstract":"Multimodal Sarcasm Explanation (MuSE) is a new yet challenging task, which aims to generate a natural language sentence for a multimodal social post (an image as well as its caption) to explain why it contains sarcasm. Although the existing pioneer study has achieved great success with the BART backbone, it overlooks the gap between the visual feature space and the decoder semantic space, the object-level metadata of the image, as well as the potential external knowledge. To solve these limitations, in this work, we propose a novel mulTi-source sEmantic grAph-based Multimodal sarcasm explanation scheme, named TEAM. In particular, TEAM extracts the object-level semantic meta-data instead of the traditional global visual features from the input image. Meanwhile, TEAM resorts to ConceptNet to obtain the external related knowledge concepts for the input text and the extracted object meta-data. Thereafter, TEAM introduces a multi-source semantic graph that comprehensively characterize the multi-source (i.e., caption, object meta-data, external knowledge) semantic relations to facilitate the sarcasm reasoning. Extensive experiments on a public released dataset MORE verify the superiority of our model over cutting-edge methods.","PeriodicalId":352845,"journal":{"name":"Annual Meeting of the Association for Computational Linguistics","volume":"70 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Multi-source Semantic Graph-based Multimodal Sarcasm Explanation Generation\",\"authors\":\"Liqiang Jing, Xuemeng Song, Kun Ouyang, Mengzhao Jia, Liqiang Nie\",\"doi\":\"10.48550/arXiv.2306.16650\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Multimodal Sarcasm Explanation (MuSE) is a new yet challenging task, which aims to generate a natural language sentence for a multimodal social post (an image as well as its caption) to explain why it contains sarcasm. Although the existing pioneer study has achieved great success with the BART backbone, it overlooks the gap between the visual feature space and the decoder semantic space, the object-level metadata of the image, as well as the potential external knowledge. To solve these limitations, in this work, we propose a novel mulTi-source sEmantic grAph-based Multimodal sarcasm explanation scheme, named TEAM. In particular, TEAM extracts the object-level semantic meta-data instead of the traditional global visual features from the input image. Meanwhile, TEAM resorts to ConceptNet to obtain the external related knowledge concepts for the input text and the extracted object meta-data. Thereafter, TEAM introduces a multi-source semantic graph that comprehensively characterize the multi-source (i.e., caption, object meta-data, external knowledge) semantic relations to facilitate the sarcasm reasoning. Extensive experiments on a public released dataset MORE verify the superiority of our model over cutting-edge methods.\",\"PeriodicalId\":352845,\"journal\":{\"name\":\"Annual Meeting of the Association for Computational Linguistics\",\"volume\":\"70 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-06-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Annual Meeting of the Association for Computational Linguistics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.48550/arXiv.2306.16650\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Annual Meeting of the Association for Computational Linguistics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2306.16650","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

多模态讽刺解释(MuSE)是一个新的但具有挑战性的任务，其目的是为多模态社交帖子(图像及其标题)生成一个自然语言句子，以解释为什么它包含讽刺。虽然现有的先驱研究在BART主干上取得了巨大的成功，但它忽略了视觉特征空间与解码器语义空间之间的差距、图像的对象级元数据以及潜在的外部知识。为了解决这些限制，在这项工作中，我们提出了一种新的基于多源语义图的多模态讽刺解释方案，名为TEAM。特别是，TEAM从输入图像中提取对象级语义元数据，而不是传统的全局视觉特征。同时，TEAM通过ConceptNet获取输入文本和抽取对象元数据的外部相关知识概念。随后，TEAM引入了一个多源语义图，全面表征了多源(即标题、对象元数据、外部知识)的语义关系，以促进讽刺推理。在公开发布的数据集上进行的大量实验验证了我们的模型优于前沿方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Multi-source Semantic Graph-based Multimodal Sarcasm Explanation Generation

Multimodal Sarcasm Explanation (MuSE) is a new yet challenging task, which aims to generate a natural language sentence for a multimodal social post (an image as well as its caption) to explain why it contains sarcasm. Although the existing pioneer study has achieved great success with the BART backbone, it overlooks the gap between the visual feature space and the decoder semantic space, the object-level metadata of the image, as well as the potential external knowledge. To solve these limitations, in this work, we propose a novel mulTi-source sEmantic grAph-based Multimodal sarcasm explanation scheme, named TEAM. In particular, TEAM extracts the object-level semantic meta-data instead of the traditional global visual features from the input image. Meanwhile, TEAM resorts to ConceptNet to obtain the external related knowledge concepts for the input text and the extracted object meta-data. Thereafter, TEAM introduces a multi-source semantic graph that comprehensively characterize the multi-source (i.e., caption, object meta-data, external knowledge) semantic relations to facilitate the sarcasm reasoning. Extensive experiments on a public released dataset MORE verify the superiority of our model over cutting-edge methods.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Annual Meeting of the Association for Computational Linguistics

自引率

0.00%

发文量