{"title":"用于文档级关系提取的基于置信度的自适应数据修订框架","authors":"Chao Jiang , Jinzhi Liao , Xiang Zhao , Daojian Zeng , Jianhua Dai","doi":"10.1016/j.ipm.2024.103909","DOIUrl":null,"url":null,"abstract":"<div><div>Noisy annotations have become a key issue limiting <strong>Doc</strong>ument-level <strong>R</strong>elation <strong>E</strong>xtraction <strong>(DocRE)</strong>. Previous research explored the problem through manual re-annotation. However, the handcrafted strategy is of low efficiency, incurs high human costs and cannot be generalized to large-scale datasets. To address the problem, we construct a confidence-based <strong>Re</strong>vision framework for <strong>D</strong>ocRE (<strong>ReD</strong>), aiming to achieve high-quality automatic data revision. Specifically, we first introduce a denoising training module to recognize relational facts and prevent noisy annotations. Second, a confidence-based data revision module is equipped to perform adaptive data revision for long-tail distributed relational facts. After the data revision, we design an iterative training module to create a virtuous cycle, which transforms the revised data into useful training data to support further revision. By capitalizing on ReD, we propose <strong>ReD-DocRED</strong>, which consists of 101,873 revised annotated documents from DocRED. ReD-DocRED has introduced 57.1% new relational facts, and concurrently, models trained on ReD-DocRED have achieved significant improvements in F1 scores, ranging from 6.35 to 16.55. The experimental results demonstrate that ReD can achieve high-quality data revision and, to some extent, replace manual labeling.<span><span><sup>1</sup></span></span></div></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":"62 1","pages":"Article 103909"},"PeriodicalIF":7.4000,"publicationDate":"2024-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"An adaptive confidence-based data revision framework for Document-level Relation Extraction\",\"authors\":\"Chao Jiang , Jinzhi Liao , Xiang Zhao , Daojian Zeng , Jianhua Dai\",\"doi\":\"10.1016/j.ipm.2024.103909\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Noisy annotations have become a key issue limiting <strong>Doc</strong>ument-level <strong>R</strong>elation <strong>E</strong>xtraction <strong>(DocRE)</strong>. Previous research explored the problem through manual re-annotation. However, the handcrafted strategy is of low efficiency, incurs high human costs and cannot be generalized to large-scale datasets. To address the problem, we construct a confidence-based <strong>Re</strong>vision framework for <strong>D</strong>ocRE (<strong>ReD</strong>), aiming to achieve high-quality automatic data revision. Specifically, we first introduce a denoising training module to recognize relational facts and prevent noisy annotations. Second, a confidence-based data revision module is equipped to perform adaptive data revision for long-tail distributed relational facts. After the data revision, we design an iterative training module to create a virtuous cycle, which transforms the revised data into useful training data to support further revision. By capitalizing on ReD, we propose <strong>ReD-DocRED</strong>, which consists of 101,873 revised annotated documents from DocRED. ReD-DocRED has introduced 57.1% new relational facts, and concurrently, models trained on ReD-DocRED have achieved significant improvements in F1 scores, ranging from 6.35 to 16.55. The experimental results demonstrate that ReD can achieve high-quality data revision and, to some extent, replace manual labeling.<span><span><sup>1</sup></span></span></div></div>\",\"PeriodicalId\":50365,\"journal\":{\"name\":\"Information Processing & Management\",\"volume\":\"62 1\",\"pages\":\"Article 103909\"},\"PeriodicalIF\":7.4000,\"publicationDate\":\"2024-09-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Information Processing & Management\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0306457324002681\",\"RegionNum\":1,\"RegionCategory\":\"管理学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Processing & Management","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0306457324002681","RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
An adaptive confidence-based data revision framework for Document-level Relation Extraction
Noisy annotations have become a key issue limiting Document-level Relation Extraction (DocRE). Previous research explored the problem through manual re-annotation. However, the handcrafted strategy is of low efficiency, incurs high human costs and cannot be generalized to large-scale datasets. To address the problem, we construct a confidence-based Revision framework for DocRE (ReD), aiming to achieve high-quality automatic data revision. Specifically, we first introduce a denoising training module to recognize relational facts and prevent noisy annotations. Second, a confidence-based data revision module is equipped to perform adaptive data revision for long-tail distributed relational facts. After the data revision, we design an iterative training module to create a virtuous cycle, which transforms the revised data into useful training data to support further revision. By capitalizing on ReD, we propose ReD-DocRED, which consists of 101,873 revised annotated documents from DocRED. ReD-DocRED has introduced 57.1% new relational facts, and concurrently, models trained on ReD-DocRED have achieved significant improvements in F1 scores, ranging from 6.35 to 16.55. The experimental results demonstrate that ReD can achieve high-quality data revision and, to some extent, replace manual labeling.1
期刊介绍:
Information Processing and Management is dedicated to publishing cutting-edge original research at the convergence of computing and information science. Our scope encompasses theory, methods, and applications across various domains, including advertising, business, health, information science, information technology marketing, and social computing.
We aim to cater to the interests of both primary researchers and practitioners by offering an effective platform for the timely dissemination of advanced and topical issues in this interdisciplinary field. The journal places particular emphasis on original research articles, research survey articles, research method articles, and articles addressing critical applications of research. Join us in advancing knowledge and innovation at the intersection of computing and information science.