Document level Relationship Extraction based on context feature enhancement

IF 3.3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Nan Zhang, Ziming Cui, Qiang Cai
{"title":"Document level Relationship Extraction based on context feature enhancement","authors":"Nan Zhang,&nbsp;Ziming Cui,&nbsp;Qiang Cai","doi":"10.1016/j.patrec.2025.07.006","DOIUrl":null,"url":null,"abstract":"<div><div>Document level Relationship Extraction (DocRE) tasks aim to extract relationships between multiple entities from long texts. However, obtaining feature representations for entity pairs that span multiple sentences is a challenge. Additionally, the feature information for triplets depends on both intra-document and inter-sentence information. To address this issue, this paper proposes a model named Plus-DocRE for DocRE(PDRE). Firstly, we introduce entity segmentation based on spans to increase the potential number of entities and improve negative sample recognition. Secondly, we utilize the BERT pre-trained model to obtain paragraph and local context information, enriching the features of entity pairs. Finally, through linear layers and self-attention mechanisms, we fuse the features of local and paragraph context for multi-label relationship classification, enabling entity relationship extraction. Meanwhile, we introduce a new data mechanism, C-DocRE, to simulate a more realistic scenario with annotation errors. Experimental results show that the PDRE model outperforms other baseline models in performance, achieving an F1 score of 53.6.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"197 ","pages":"Pages 24-30"},"PeriodicalIF":3.3000,"publicationDate":"2025-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pattern Recognition Letters","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167865525002582","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Document level Relationship Extraction (DocRE) tasks aim to extract relationships between multiple entities from long texts. However, obtaining feature representations for entity pairs that span multiple sentences is a challenge. Additionally, the feature information for triplets depends on both intra-document and inter-sentence information. To address this issue, this paper proposes a model named Plus-DocRE for DocRE(PDRE). Firstly, we introduce entity segmentation based on spans to increase the potential number of entities and improve negative sample recognition. Secondly, we utilize the BERT pre-trained model to obtain paragraph and local context information, enriching the features of entity pairs. Finally, through linear layers and self-attention mechanisms, we fuse the features of local and paragraph context for multi-label relationship classification, enabling entity relationship extraction. Meanwhile, we introduce a new data mechanism, C-DocRE, to simulate a more realistic scenario with annotation errors. Experimental results show that the PDRE model outperforms other baseline models in performance, achieving an F1 score of 53.6.
基于上下文特征增强的文档级关系提取
文档级关系提取(DocRE)任务旨在从长文本中提取多个实体之间的关系。然而,获取跨越多个句子的实体对的特征表示是一个挑战。此外,三元组的特征信息依赖于文档内和句子间的信息。为了解决这一问题,本文提出了一个名为Plus-DocRE的DocRE模型(PDRE)。首先,引入基于跨度的实体分割,增加潜在实体数量,提高负样本识别能力;其次,利用BERT预训练模型获取段落和局部上下文信息,丰富实体对的特征;最后,通过线性层和自关注机制,融合局部和段落上下文的特征进行多标签关系分类,实现实体关系提取。同时,我们引入了一种新的数据机制C-DocRE,以模拟更真实的带有注释错误的场景。实验结果表明,PDRE模型的性能优于其他基准模型,F1得分为53.6。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Pattern Recognition Letters
Pattern Recognition Letters 工程技术-计算机:人工智能
CiteScore
12.40
自引率
5.90%
发文量
287
审稿时长
9.1 months
期刊介绍: Pattern Recognition Letters aims at rapid publication of concise articles of a broad interest in pattern recognition. Subject areas include all the current fields of interest represented by the Technical Committees of the International Association of Pattern Recognition, and other developing themes involving learning and recognition.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信