基于关联感知的多模态社会实体及关系提取方法

IF 6.5 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neurocomputing Pub Date : 2025-05-07 DOI:10.1016/j.neucom.2025.130316

Zhenbin Chen , Zhixin Li , Mingqi Liu , Canlong Zhang , Huifang Ma

{"title":"基于关联感知的多模态社会实体及关系提取方法","authors":"Zhenbin Chen , Zhixin Li , Mingqi Liu , Canlong Zhang , Huifang Ma","doi":"10.1016/j.neucom.2025.130316","DOIUrl":null,"url":null,"abstract":"<div><div>Multimodal Named Entity Recognition (MNER) and Multimodal Relation Extraction (MRE) aim to identify specific entities from given text–image pairs and classify the semantic relationships between them, and they have significant applications in social media platform analysis. However, the images and text in social media data are not always aligned, which makes the existing multimodal entity and relation extraction methods still mainly rely on text information. And those mismatched images can even introduce modality noise, leading to negative impacts on the model and preventing them from achieving better performance. To solve this issue, we propose a Relevance-Aware Prompt-tuning (RAP) method with dynamic router mechanism for multi-modal entity and relation extraction. Our method can adaptively learn effective multimodal features from various types of information as prompt vectors and utilize prompt-tuning for entity and relation extraction. Additionally, when integrating information from different modalities, we take into account the intermodal relevance to reduce the negative impact of mismatched visual information on the model, which allows our model to overcome modality noise and achieve better performance. Extensive experiments on three benchmark datasets of tweets demonstrated the effectiveness and superiority of our proposed approach, and achieved approximately 2% increase in F1 values on the three datasets, respectively.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"640 ","pages":"Article 130316"},"PeriodicalIF":6.5000,"publicationDate":"2025-05-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Relevance-aware prompt-tuning method for multimodal social entity and relation extraction\",\"authors\":\"Zhenbin Chen , Zhixin Li , Mingqi Liu , Canlong Zhang , Huifang Ma\",\"doi\":\"10.1016/j.neucom.2025.130316\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Multimodal Named Entity Recognition (MNER) and Multimodal Relation Extraction (MRE) aim to identify specific entities from given text–image pairs and classify the semantic relationships between them, and they have significant applications in social media platform analysis. However, the images and text in social media data are not always aligned, which makes the existing multimodal entity and relation extraction methods still mainly rely on text information. And those mismatched images can even introduce modality noise, leading to negative impacts on the model and preventing them from achieving better performance. To solve this issue, we propose a Relevance-Aware Prompt-tuning (RAP) method with dynamic router mechanism for multi-modal entity and relation extraction. Our method can adaptively learn effective multimodal features from various types of information as prompt vectors and utilize prompt-tuning for entity and relation extraction. Additionally, when integrating information from different modalities, we take into account the intermodal relevance to reduce the negative impact of mismatched visual information on the model, which allows our model to overcome modality noise and achieve better performance. Extensive experiments on three benchmark datasets of tweets demonstrated the effectiveness and superiority of our proposed approach, and achieved approximately 2% increase in F1 values on the three datasets, respectively.</div></div>\",\"PeriodicalId\":19268,\"journal\":{\"name\":\"Neurocomputing\",\"volume\":\"640 \",\"pages\":\"Article 130316\"},\"PeriodicalIF\":6.5000,\"publicationDate\":\"2025-05-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Neurocomputing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0925231225009889\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neurocomputing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0925231225009889","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

多模态命名实体识别（MNER）和多模态关系提取（MRE）旨在从给定的文本-图像对中识别特定实体，并对它们之间的语义关系进行分类，在社交媒体平台分析中具有重要应用。然而，社交媒体数据中的图像和文本并不总是对齐的，这使得现有的多模态实体和关系提取方法仍然主要依赖于文本信息。这些不匹配的图像甚至会引入模态噪声，对模型产生负面影响，使其无法获得更好的性能。为了解决这一问题，我们提出了一种基于动态路由机制的关联感知提示调优（RAP）方法，用于多模态实体和关系提取。该方法可以自适应地从各种类型的信息中学习有效的多模态特征作为提示向量，并利用提示调谐进行实体和关系提取。此外，在整合不同模态的信息时，我们考虑了多模态相关性，以减少视觉信息不匹配对模型的负面影响，从而使我们的模型能够克服模态噪声并获得更好的性能。在tweet的三个基准数据集上进行的大量实验证明了我们提出的方法的有效性和优越性，并且在三个数据集上分别实现了大约2%的F1值提高。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Relevance-aware prompt-tuning method for multimodal social entity and relation extraction

Multimodal Named Entity Recognition (MNER) and Multimodal Relation Extraction (MRE) aim to identify specific entities from given text–image pairs and classify the semantic relationships between them, and they have significant applications in social media platform analysis. However, the images and text in social media data are not always aligned, which makes the existing multimodal entity and relation extraction methods still mainly rely on text information. And those mismatched images can even introduce modality noise, leading to negative impacts on the model and preventing them from achieving better performance. To solve this issue, we propose a Relevance-Aware Prompt-tuning (RAP) method with dynamic router mechanism for multi-modal entity and relation extraction. Our method can adaptively learn effective multimodal features from various types of information as prompt vectors and utilize prompt-tuning for entity and relation extraction. Additionally, when integrating information from different modalities, we take into account the intermodal relevance to reduce the negative impact of mismatched visual information on the model, which allows our model to overcome modality noise and achieve better performance. Extensive experiments on three benchmark datasets of tweets demonstrated the effectiveness and superiority of our proposed approach, and achieved approximately 2% increase in F1 values on the three datasets, respectively.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Neurocomputing 工程技术-计算机：人工智能

CiteScore

13.10

自引率

10.00%

发文量

1382

审稿时长

70 days

期刊介绍： Neurocomputing publishes articles describing recent fundamental contributions in the field of neurocomputing. Neurocomputing theory, practice and applications are the essential topics being covered.