Biomedical event causal relation extraction with deep knowledge fusion and Roberta-based data augmentation

IF 4.2 3区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS
Lishuang Li, Yi Xiang, Jing Hao
{"title":"Biomedical event causal relation extraction with deep knowledge fusion and Roberta-based data augmentation","authors":"Lishuang Li,&nbsp;Yi Xiang,&nbsp;Jing Hao","doi":"10.1016/j.ymeth.2024.08.007","DOIUrl":null,"url":null,"abstract":"<div><p>Biomedical event causal relation extraction (BECRE), as a subtask of biomedical information extraction, aims to extract event causal relation facts from unstructured biomedical texts and plays an essential role in many downstream tasks. The existing works have two main problems: i) Only shallow features are limited in helping the model establish potential relationships between biomedical events. ii) Using the traditional oversampling method to solve the data imbalance problem of the BECRE tasks ignores the requirements for data diversifying. This paper proposes a novel biomedical event causal relation extraction method to solve the above problems using deep knowledge fusion and Roberta-based data augmentation. To address the first problem, we fuse deep knowledge, including structural event representation and entity relation path, for establishing potential semantic connections between biomedical events. We use the Graph Convolutional Neural network (GCN) and the predicated tensor model to acquire structural event representation, and entity relation paths are encoded based on the external knowledge bases (GTD, CDR, CHR, GDA and UMLS). We introduce the triplet attention mechanism to fuse structural event representation and entity relation path information. Besides, this paper proposes the Roberta-based data augmentation method to address the second problem, some words of biomedical text, except biomedical events, are masked proportionally and randomly, and then pre-trained Roberta generates data instances for the imbalance BECRE dataset. Extensive experimental results on Hahn-Powell's and BioCause datasets confirm that the proposed method achieves state-of-the-art performance compared to current advances.</p></div>","PeriodicalId":390,"journal":{"name":"Methods","volume":"231 ","pages":"Pages 8-14"},"PeriodicalIF":4.2000,"publicationDate":"2024-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Methods","FirstCategoryId":"99","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1046202324001889","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0

Abstract

Biomedical event causal relation extraction (BECRE), as a subtask of biomedical information extraction, aims to extract event causal relation facts from unstructured biomedical texts and plays an essential role in many downstream tasks. The existing works have two main problems: i) Only shallow features are limited in helping the model establish potential relationships between biomedical events. ii) Using the traditional oversampling method to solve the data imbalance problem of the BECRE tasks ignores the requirements for data diversifying. This paper proposes a novel biomedical event causal relation extraction method to solve the above problems using deep knowledge fusion and Roberta-based data augmentation. To address the first problem, we fuse deep knowledge, including structural event representation and entity relation path, for establishing potential semantic connections between biomedical events. We use the Graph Convolutional Neural network (GCN) and the predicated tensor model to acquire structural event representation, and entity relation paths are encoded based on the external knowledge bases (GTD, CDR, CHR, GDA and UMLS). We introduce the triplet attention mechanism to fuse structural event representation and entity relation path information. Besides, this paper proposes the Roberta-based data augmentation method to address the second problem, some words of biomedical text, except biomedical events, are masked proportionally and randomly, and then pre-trained Roberta generates data instances for the imbalance BECRE dataset. Extensive experimental results on Hahn-Powell's and BioCause datasets confirm that the proposed method achieves state-of-the-art performance compared to current advances.

利用深度知识融合和基于罗伯塔的数据增强技术提取生物医学事件因果关系。
生物医学事件因果关系提取(BECRE)作为生物医学信息提取的一个子任务,旨在从非结构化的生物医学文本中提取事件因果关系事实,在许多下游任务中发挥着至关重要的作用。现有研究存在两个主要问题:i) 局限于浅层特征,无法帮助模型建立生物医学事件之间的潜在关系;ii) 使用传统的超采样方法解决 BECRE 任务的数据不平衡问题,忽视了数据多样化的要求。本文提出了一种新颖的生物医学事件因果关系提取方法,利用深度知识融合和基于 Roberta 的数据增强来解决上述问题。针对第一个问题,我们融合了深度知识,包括结构事件表示和实体关系路径,以建立生物医学事件之间的潜在语义联系。我们使用图卷积神经网络(GCN)和预言张量模型来获取结构事件表示,并基于外部知识库(GTD、CDR、CHR、GDA 和 UMLS)对实体关系路径进行编码。我们引入了三元组关注机制来融合结构事件表示和实体关系路径信息。此外,针对第二个问题,本文提出了基于 Roberta 的数据增强方法,即对生物医学文本中除生物医学事件外的部分词语进行按比例的随机屏蔽,然后由预先训练好的 Roberta 为不平衡的 BECRE 数据集生成数据实例。在 Hahn-Powell's 和 BioCause 数据集上的大量实验结果证实,与目前的先进方法相比,所提出的方法达到了最先进的性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Methods
Methods 生物-生化研究方法
CiteScore
9.80
自引率
2.10%
发文量
222
审稿时长
11.3 weeks
期刊介绍: Methods focuses on rapidly developing techniques in the experimental biological and medical sciences. Each topical issue, organized by a guest editor who is an expert in the area covered, consists solely of invited quality articles by specialist authors, many of them reviews. Issues are devoted to specific technical approaches with emphasis on clear detailed descriptions of protocols that allow them to be reproduced easily. The background information provided enables researchers to understand the principles underlying the methods; other helpful sections include comparisons of alternative methods giving the advantages and disadvantages of particular methods, guidance on avoiding potential pitfalls, and suggestions for troubleshooting.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信