{"title":"Biomedical event causal relation extraction with deep knowledge fusion and Roberta-based data augmentation","authors":"Lishuang Li, Yi Xiang, Jing Hao","doi":"10.1016/j.ymeth.2024.08.007","DOIUrl":null,"url":null,"abstract":"<div><p>Biomedical event causal relation extraction (BECRE), as a subtask of biomedical information extraction, aims to extract event causal relation facts from unstructured biomedical texts and plays an essential role in many downstream tasks. The existing works have two main problems: i) Only shallow features are limited in helping the model establish potential relationships between biomedical events. ii) Using the traditional oversampling method to solve the data imbalance problem of the BECRE tasks ignores the requirements for data diversifying. This paper proposes a novel biomedical event causal relation extraction method to solve the above problems using deep knowledge fusion and Roberta-based data augmentation. To address the first problem, we fuse deep knowledge, including structural event representation and entity relation path, for establishing potential semantic connections between biomedical events. We use the Graph Convolutional Neural network (GCN) and the predicated tensor model to acquire structural event representation, and entity relation paths are encoded based on the external knowledge bases (GTD, CDR, CHR, GDA and UMLS). We introduce the triplet attention mechanism to fuse structural event representation and entity relation path information. Besides, this paper proposes the Roberta-based data augmentation method to address the second problem, some words of biomedical text, except biomedical events, are masked proportionally and randomly, and then pre-trained Roberta generates data instances for the imbalance BECRE dataset. Extensive experimental results on Hahn-Powell's and BioCause datasets confirm that the proposed method achieves state-of-the-art performance compared to current advances.</p></div>","PeriodicalId":390,"journal":{"name":"Methods","volume":"231 ","pages":"Pages 8-14"},"PeriodicalIF":4.2000,"publicationDate":"2024-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Methods","FirstCategoryId":"99","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1046202324001889","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0
Abstract
Biomedical event causal relation extraction (BECRE), as a subtask of biomedical information extraction, aims to extract event causal relation facts from unstructured biomedical texts and plays an essential role in many downstream tasks. The existing works have two main problems: i) Only shallow features are limited in helping the model establish potential relationships between biomedical events. ii) Using the traditional oversampling method to solve the data imbalance problem of the BECRE tasks ignores the requirements for data diversifying. This paper proposes a novel biomedical event causal relation extraction method to solve the above problems using deep knowledge fusion and Roberta-based data augmentation. To address the first problem, we fuse deep knowledge, including structural event representation and entity relation path, for establishing potential semantic connections between biomedical events. We use the Graph Convolutional Neural network (GCN) and the predicated tensor model to acquire structural event representation, and entity relation paths are encoded based on the external knowledge bases (GTD, CDR, CHR, GDA and UMLS). We introduce the triplet attention mechanism to fuse structural event representation and entity relation path information. Besides, this paper proposes the Roberta-based data augmentation method to address the second problem, some words of biomedical text, except biomedical events, are masked proportionally and randomly, and then pre-trained Roberta generates data instances for the imbalance BECRE dataset. Extensive experimental results on Hahn-Powell's and BioCause datasets confirm that the proposed method achieves state-of-the-art performance compared to current advances.
期刊介绍:
Methods focuses on rapidly developing techniques in the experimental biological and medical sciences.
Each topical issue, organized by a guest editor who is an expert in the area covered, consists solely of invited quality articles by specialist authors, many of them reviews. Issues are devoted to specific technical approaches with emphasis on clear detailed descriptions of protocols that allow them to be reproduced easily. The background information provided enables researchers to understand the principles underlying the methods; other helpful sections include comparisons of alternative methods giving the advantages and disadvantages of particular methods, guidance on avoiding potential pitfalls, and suggestions for troubleshooting.