Zhenbin Chen , Zhixin Li , Mingqi Liu , Canlong Zhang , Huifang Ma
{"title":"基于关联感知的多模态社会实体及关系提取方法","authors":"Zhenbin Chen , Zhixin Li , Mingqi Liu , Canlong Zhang , Huifang Ma","doi":"10.1016/j.neucom.2025.130316","DOIUrl":null,"url":null,"abstract":"<div><div>Multimodal Named Entity Recognition (MNER) and Multimodal Relation Extraction (MRE) aim to identify specific entities from given text–image pairs and classify the semantic relationships between them, and they have significant applications in social media platform analysis. However, the images and text in social media data are not always aligned, which makes the existing multimodal entity and relation extraction methods still mainly rely on text information. And those mismatched images can even introduce modality noise, leading to negative impacts on the model and preventing them from achieving better performance. To solve this issue, we propose a Relevance-Aware Prompt-tuning (RAP) method with dynamic router mechanism for multi-modal entity and relation extraction. Our method can adaptively learn effective multimodal features from various types of information as prompt vectors and utilize prompt-tuning for entity and relation extraction. Additionally, when integrating information from different modalities, we take into account the intermodal relevance to reduce the negative impact of mismatched visual information on the model, which allows our model to overcome modality noise and achieve better performance. Extensive experiments on three benchmark datasets of tweets demonstrated the effectiveness and superiority of our proposed approach, and achieved approximately 2% increase in F1 values on the three datasets, respectively.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"640 ","pages":"Article 130316"},"PeriodicalIF":6.5000,"publicationDate":"2025-05-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Relevance-aware prompt-tuning method for multimodal social entity and relation extraction\",\"authors\":\"Zhenbin Chen , Zhixin Li , Mingqi Liu , Canlong Zhang , Huifang Ma\",\"doi\":\"10.1016/j.neucom.2025.130316\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Multimodal Named Entity Recognition (MNER) and Multimodal Relation Extraction (MRE) aim to identify specific entities from given text–image pairs and classify the semantic relationships between them, and they have significant applications in social media platform analysis. However, the images and text in social media data are not always aligned, which makes the existing multimodal entity and relation extraction methods still mainly rely on text information. And those mismatched images can even introduce modality noise, leading to negative impacts on the model and preventing them from achieving better performance. To solve this issue, we propose a Relevance-Aware Prompt-tuning (RAP) method with dynamic router mechanism for multi-modal entity and relation extraction. Our method can adaptively learn effective multimodal features from various types of information as prompt vectors and utilize prompt-tuning for entity and relation extraction. Additionally, when integrating information from different modalities, we take into account the intermodal relevance to reduce the negative impact of mismatched visual information on the model, which allows our model to overcome modality noise and achieve better performance. Extensive experiments on three benchmark datasets of tweets demonstrated the effectiveness and superiority of our proposed approach, and achieved approximately 2% increase in F1 values on the three datasets, respectively.</div></div>\",\"PeriodicalId\":19268,\"journal\":{\"name\":\"Neurocomputing\",\"volume\":\"640 \",\"pages\":\"Article 130316\"},\"PeriodicalIF\":6.5000,\"publicationDate\":\"2025-05-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Neurocomputing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0925231225009889\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neurocomputing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0925231225009889","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
Relevance-aware prompt-tuning method for multimodal social entity and relation extraction
Multimodal Named Entity Recognition (MNER) and Multimodal Relation Extraction (MRE) aim to identify specific entities from given text–image pairs and classify the semantic relationships between them, and they have significant applications in social media platform analysis. However, the images and text in social media data are not always aligned, which makes the existing multimodal entity and relation extraction methods still mainly rely on text information. And those mismatched images can even introduce modality noise, leading to negative impacts on the model and preventing them from achieving better performance. To solve this issue, we propose a Relevance-Aware Prompt-tuning (RAP) method with dynamic router mechanism for multi-modal entity and relation extraction. Our method can adaptively learn effective multimodal features from various types of information as prompt vectors and utilize prompt-tuning for entity and relation extraction. Additionally, when integrating information from different modalities, we take into account the intermodal relevance to reduce the negative impact of mismatched visual information on the model, which allows our model to overcome modality noise and achieve better performance. Extensive experiments on three benchmark datasets of tweets demonstrated the effectiveness and superiority of our proposed approach, and achieved approximately 2% increase in F1 values on the three datasets, respectively.
期刊介绍:
Neurocomputing publishes articles describing recent fundamental contributions in the field of neurocomputing. Neurocomputing theory, practice and applications are the essential topics being covered.