Effective sentence-level relation extraction model using entity-centric dependency tree

IF 3.5 4区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

PeerJ Computer Science Pub Date : 2024-09-18 DOI:10.7717/peerj-cs.2311

Seongsik Park, Harksoo Kim

{"title":"Effective sentence-level relation extraction model using entity-centric dependency tree","authors":"Seongsik Park, Harksoo Kim","doi":"10.7717/peerj-cs.2311","DOIUrl":null,"url":null,"abstract":"The syntactic information of a dependency tree is an essential feature in relation extraction studies. Traditional dependency-based relation extraction methods can be categorized into hard pruning methods, which aim to remove unnecessary information, and soft pruning methods, which aim to utilize all lexical information. However, hard pruning has the potential to overlook important lexical information, while soft pruning can weaken the syntactic information between entities. As a result, recent studies in relation extraction have been shifting from dependency-based methods to pre-trained language model (LM) based methods. Nonetheless, LM-based methods increasingly demand larger language models and additional data. This trend leads to higher resource consumption, longer training times, and increased computational costs, yet often results in only marginal performance improvements. To address this problem, we propose a relation extraction model based on an entity-centric dependency tree: a dependency tree that is reconstructed by considering entities as root nodes. Using the entity-centric dependency tree, the proposed method can capture the syntactic information of an input sentence without losing lexical information. Additionally, we propose a novel model that utilizes entity-centric dependency trees in conjunction with language models, enabling efficient relation extraction without the need for additional data or larger models. In experiments with representative sentence-level relation extraction datasets such as TACRED, Re-TACRED, and SemEval 2010 Task 8, the proposed method achieves F1-scores of 74.9%, 91.2%, and 90.5%, respectively, which are state-of-the-art performances.","PeriodicalId":54224,"journal":{"name":"PeerJ Computer Science","volume":"64 1","pages":""},"PeriodicalIF":3.5000,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"PeerJ Computer Science","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.7717/peerj-cs.2311","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

The syntactic information of a dependency tree is an essential feature in relation extraction studies. Traditional dependency-based relation extraction methods can be categorized into hard pruning methods, which aim to remove unnecessary information, and soft pruning methods, which aim to utilize all lexical information. However, hard pruning has the potential to overlook important lexical information, while soft pruning can weaken the syntactic information between entities. As a result, recent studies in relation extraction have been shifting from dependency-based methods to pre-trained language model (LM) based methods. Nonetheless, LM-based methods increasingly demand larger language models and additional data. This trend leads to higher resource consumption, longer training times, and increased computational costs, yet often results in only marginal performance improvements. To address this problem, we propose a relation extraction model based on an entity-centric dependency tree: a dependency tree that is reconstructed by considering entities as root nodes. Using the entity-centric dependency tree, the proposed method can capture the syntactic information of an input sentence without losing lexical information. Additionally, we propose a novel model that utilizes entity-centric dependency trees in conjunction with language models, enabling efficient relation extraction without the need for additional data or larger models. In experiments with representative sentence-level relation extraction datasets such as TACRED, Re-TACRED, and SemEval 2010 Task 8, the proposed method achieves F1-scores of 74.9%, 91.2%, and 90.5%, respectively, which are state-of-the-art performances.

查看原文本刊更多论文

使用以实体为中心的依赖树建立有效的句子级关系提取模型

在关系提取研究中，依赖树的句法信息是一个基本特征。传统的基于依存关系的关系提取方法可分为硬剪枝法和软剪枝法，前者旨在去除不必要的信息，后者旨在利用所有词法信息。然而，硬剪枝有可能忽略重要的词汇信息，而软剪枝则可能削弱实体间的句法信息。因此，最近的关系提取研究已经从基于依赖关系的方法转向基于预训练语言模型（LM）的方法。然而，基于 LM 的方法越来越需要更大的语言模型和更多的数据。这一趋势导致了更高的资源消耗、更长的训练时间和更高的计算成本，但往往只能带来微不足道的性能提升。为了解决这个问题，我们提出了一种基于以实体为中心的依赖树的关系提取模型：这种依赖树是通过将实体视为根节点来重建的。利用以实体为中心的依赖树，我们提出的方法可以捕捉输入句子的句法信息，而不会丢失词法信息。此外，我们还提出了一种新颖的模型，将以实体为中心的依赖树与语言模型结合使用，从而实现高效的关系提取，而无需额外的数据或更大的模型。在具有代表性的句子级关系提取数据集（如 TACRED、Re-TACRED 和 SemEval 2010 Task 8）的实验中，所提出的方法的 F1 分数分别达到了 74.9%、91.2% 和 90.5%，达到了最先进的水平。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

PeerJ Computer Science Computer Science-General Computer Science

CiteScore

6.10

自引率

5.30%

发文量

332

审稿时长

10 weeks

期刊介绍： PeerJ Computer Science is the new open access journal covering all subject areas in computer science, with the backing of a prestigious advisory board and more than 300 academic editors.