Dual-Gated Fusion with Prefix-Tuning for Multi-Modal Relation Extraction

Annual Meeting of the Association for Computational Linguistics Pub Date : 2023-06-19 DOI:10.48550/arXiv.2306.11020

Qian Li, Shu Guo, Cheng Ji, Xutan Peng, Shiyao Cui, Jianxin Li

{"title":"Dual-Gated Fusion with Prefix-Tuning for Multi-Modal Relation Extraction","authors":"Qian Li, Shu Guo, Cheng Ji, Xutan Peng, Shiyao Cui, Jianxin Li","doi":"10.48550/arXiv.2306.11020","DOIUrl":null,"url":null,"abstract":"Multi-Modal Relation Extraction (MMRE) aims at identifying the relation between two entities in texts that contain visual clues. Rich visual content is valuable for the MMRE task, but existing works cannot well model finer associations among different modalities, failing to capture the truly helpful visual information and thus limiting relation extraction performance. In this paper, we propose a novel MMRE framework to better capture the deeper correlations of text, entity pair, and image/objects, so as to mine more helpful information for the task, termed as DGF-PT. We first propose a prompt-based autoregressive encoder, which builds the associations of intra-modal and inter-modal features related to the task, respectively by entity-oriented and object-oriented prefixes. To better integrate helpful visual information, we design a dual-gated fusion module to distinguish the importance of image/objects and further enrich text representations. In addition, a generative decoder is introduced with entity type restriction on relations, better filtering out candidates. Extensive experiments conducted on the benchmark dataset show that our approach achieves excellent performance compared to strong competitors, even in the few-shot situation.","PeriodicalId":352845,"journal":{"name":"Annual Meeting of the Association for Computational Linguistics","volume":"26 6","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Annual Meeting of the Association for Computational Linguistics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2306.11020","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

Abstract

Multi-Modal Relation Extraction (MMRE) aims at identifying the relation between two entities in texts that contain visual clues. Rich visual content is valuable for the MMRE task, but existing works cannot well model finer associations among different modalities, failing to capture the truly helpful visual information and thus limiting relation extraction performance. In this paper, we propose a novel MMRE framework to better capture the deeper correlations of text, entity pair, and image/objects, so as to mine more helpful information for the task, termed as DGF-PT. We first propose a prompt-based autoregressive encoder, which builds the associations of intra-modal and inter-modal features related to the task, respectively by entity-oriented and object-oriented prefixes. To better integrate helpful visual information, we design a dual-gated fusion module to distinguish the importance of image/objects and further enrich text representations. In addition, a generative decoder is introduced with entity type restriction on relations, better filtering out candidates. Extensive experiments conducted on the benchmark dataset show that our approach achieves excellent performance compared to strong competitors, even in the few-shot situation.

查看原文本刊更多论文

基于前缀调谐的双门融合多模态关系提取

多模态关系提取(MMRE)旨在识别包含视觉线索的文本中两个实体之间的关系。丰富的视觉内容对MMRE任务很有价值，但现有的工作不能很好地模拟不同模态之间更精细的关联，无法捕获真正有用的视觉信息，从而限制了关系提取的性能。在本文中，我们提出了一种新的MMRE框架，以更好地捕获文本、实体对和图像/对象之间的深层相关性，从而为任务挖掘更多有用的信息，称为DGF-PT。我们首先提出了一种基于提示的自回归编码器，它分别通过面向实体和面向对象的前缀构建与任务相关的模态内和模态间特征的关联。为了更好地整合有用的视觉信息，我们设计了一个双门融合模块来区分图像/对象的重要性，并进一步丰富文本表示。此外，还引入了对关系进行实体类型限制的生成解码器，更好地过滤出候选信息。在基准数据集上进行的大量实验表明，与强大的竞争对手相比，我们的方法取得了出色的性能，即使在少量射击情况下也是如此。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Annual Meeting of the Association for Computational Linguistics

自引率

0.00%

发文量