{"title":"用于联合提取多模态实体关系的细粒度网络","authors":"Li Yuan;Yi Cai;Jingyu Xu;Qing Li;Tao Wang","doi":"10.1109/TKDE.2024.3485107","DOIUrl":null,"url":null,"abstract":"Joint multimodal entity-relation extraction (JMERE) is a challenging task that involves two joint subtasks, i.e., named entity recognition and relation extraction, from multimodal data such as text sentences with associated images. Previous JMERE methods have primarily employed 1) pipeline models, which apply pre-trained unimodal models separately and ignore the interaction between tasks, or 2) word-pair relation tagging methods, which neglect neighboring word pairs. To address these limitations, we propose a fine-grained network for JMERE. Specifically, we introduce a fine-grained alignment module that utilizes a phrase-patch to establish connections between text phrases and visual objects. This module can learn consistent multimodal representations from multimodal data. Furthermore, we address the task-irrelevant image information issue by proposing a gate fusion module, which mitigates the impact of image noise and ensures a balanced representation between image objects and text representations. Furthermore, we design a multi-word decoder that enables ensemble prediction of tags for each word pair. This approach leverages the predicted results of neighboring word pairs, improving the ability to extract multi-word entities. Evaluation results from a series of experiments demonstrate the superiority of our proposed model over state-of-the-art models in JMERE.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 1","pages":"1-14"},"PeriodicalIF":8.9000,"publicationDate":"2024-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Fine-Grained Network for Joint Multimodal Entity-Relation Extraction\",\"authors\":\"Li Yuan;Yi Cai;Jingyu Xu;Qing Li;Tao Wang\",\"doi\":\"10.1109/TKDE.2024.3485107\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Joint multimodal entity-relation extraction (JMERE) is a challenging task that involves two joint subtasks, i.e., named entity recognition and relation extraction, from multimodal data such as text sentences with associated images. Previous JMERE methods have primarily employed 1) pipeline models, which apply pre-trained unimodal models separately and ignore the interaction between tasks, or 2) word-pair relation tagging methods, which neglect neighboring word pairs. To address these limitations, we propose a fine-grained network for JMERE. Specifically, we introduce a fine-grained alignment module that utilizes a phrase-patch to establish connections between text phrases and visual objects. This module can learn consistent multimodal representations from multimodal data. Furthermore, we address the task-irrelevant image information issue by proposing a gate fusion module, which mitigates the impact of image noise and ensures a balanced representation between image objects and text representations. Furthermore, we design a multi-word decoder that enables ensemble prediction of tags for each word pair. This approach leverages the predicted results of neighboring word pairs, improving the ability to extract multi-word entities. Evaluation results from a series of experiments demonstrate the superiority of our proposed model over state-of-the-art models in JMERE.\",\"PeriodicalId\":13496,\"journal\":{\"name\":\"IEEE Transactions on Knowledge and Data Engineering\",\"volume\":\"37 1\",\"pages\":\"1-14\"},\"PeriodicalIF\":8.9000,\"publicationDate\":\"2024-10-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Knowledge and Data Engineering\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10736404/\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Knowledge and Data Engineering","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10736404/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
A Fine-Grained Network for Joint Multimodal Entity-Relation Extraction
Joint multimodal entity-relation extraction (JMERE) is a challenging task that involves two joint subtasks, i.e., named entity recognition and relation extraction, from multimodal data such as text sentences with associated images. Previous JMERE methods have primarily employed 1) pipeline models, which apply pre-trained unimodal models separately and ignore the interaction between tasks, or 2) word-pair relation tagging methods, which neglect neighboring word pairs. To address these limitations, we propose a fine-grained network for JMERE. Specifically, we introduce a fine-grained alignment module that utilizes a phrase-patch to establish connections between text phrases and visual objects. This module can learn consistent multimodal representations from multimodal data. Furthermore, we address the task-irrelevant image information issue by proposing a gate fusion module, which mitigates the impact of image noise and ensures a balanced representation between image objects and text representations. Furthermore, we design a multi-word decoder that enables ensemble prediction of tags for each word pair. This approach leverages the predicted results of neighboring word pairs, improving the ability to extract multi-word entities. Evaluation results from a series of experiments demonstrate the superiority of our proposed model over state-of-the-art models in JMERE.
期刊介绍:
The IEEE Transactions on Knowledge and Data Engineering encompasses knowledge and data engineering aspects within computer science, artificial intelligence, electrical engineering, computer engineering, and related fields. It provides an interdisciplinary platform for disseminating new developments in knowledge and data engineering and explores the practicality of these concepts in both hardware and software. Specific areas covered include knowledge-based and expert systems, AI techniques for knowledge and data management, tools, and methodologies, distributed processing, real-time systems, architectures, data management practices, database design, query languages, security, fault tolerance, statistical databases, algorithms, performance evaluation, and applications.