Modeling of joint extraction of entity relationships in clinical electronic medical records

IF 7 2区医学 Q1 BIOLOGY

Computers in biology and medicine Pub Date : 2024-09-18 DOI:10.1016/j.compbiomed.2024.109161

{"title":"Modeling of joint extraction of entity relationships in clinical electronic medical records","authors":"","doi":"10.1016/j.compbiomed.2024.109161","DOIUrl":null,"url":null,"abstract":"<div><p>The advancement of medical informatization necessitates extracting entities and their relationships from electronic medical records. Presently, research on electronic medical records predominantly concentrates on single-entity relationship extraction. However, clinical electronic medical records frequently exhibit overlapping complex entity relationships, thereby heightening the challenge of information extraction. To rectify the absence of a clinical medical relationship extraction dataset, this study utilizes electronic medical records from 584 patients in a hospital to create a compact clinical medical relationship extraction dataset. To address the pipelined relationship extraction model’s limitation in overlooking the one-to-many correlation problem between entities and relationships, this paper introduces a cascading relationship extraction model. This model integrates the MacBERT pre-training model, gated recurrent network, and multi-head self-attention mechanism to enhance the extraction of text features. Simultaneously, adversarial learning is incorporated to bolster the model’s robustness. In scenarios involving one-to-many relationships between entities, a two-phase task is employed. Initially, the main entity is predicted, followed by predicting the associated object and their correspondences. Employing this cascade-structured approach enables the model to flexibly manage intricate entity relationships, thereby enhancing extraction accuracy. Experimental results demonstrate the model’s efficiency, yielding F1-scores of 82.8%, 76.8%, and 88.2% for fulfilling relational extraction requirements and tasks on DuIE, CHIP-CDEE, and private datasets, respectively. These scores represent improvements over the benchmark model. The findings indicate the model’s applicability in practical domains, particularly in tasks such as biomedical information extraction.</p></div>","PeriodicalId":10578,"journal":{"name":"Computers in biology and medicine","volume":null,"pages":null},"PeriodicalIF":7.0000,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers in biology and medicine","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0010482524012460","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

The advancement of medical informatization necessitates extracting entities and their relationships from electronic medical records. Presently, research on electronic medical records predominantly concentrates on single-entity relationship extraction. However, clinical electronic medical records frequently exhibit overlapping complex entity relationships, thereby heightening the challenge of information extraction. To rectify the absence of a clinical medical relationship extraction dataset, this study utilizes electronic medical records from 584 patients in a hospital to create a compact clinical medical relationship extraction dataset. To address the pipelined relationship extraction model’s limitation in overlooking the one-to-many correlation problem between entities and relationships, this paper introduces a cascading relationship extraction model. This model integrates the MacBERT pre-training model, gated recurrent network, and multi-head self-attention mechanism to enhance the extraction of text features. Simultaneously, adversarial learning is incorporated to bolster the model’s robustness. In scenarios involving one-to-many relationships between entities, a two-phase task is employed. Initially, the main entity is predicted, followed by predicting the associated object and their correspondences. Employing this cascade-structured approach enables the model to flexibly manage intricate entity relationships, thereby enhancing extraction accuracy. Experimental results demonstrate the model’s efficiency, yielding F1-scores of 82.8%, 76.8%, and 88.2% for fulfilling relational extraction requirements and tasks on DuIE, CHIP-CDEE, and private datasets, respectively. These scores represent improvements over the benchmark model. The findings indicate the model’s applicability in practical domains, particularly in tasks such as biomedical information extraction.

Abstract Image

查看原文本刊更多论文

临床电子病历实体关系联合提取建模

医疗信息化的发展要求从电子病历中提取实体及其关系。目前，有关电子病历的研究主要集中在单一实体关系提取方面。然而，临床电子病历经常表现出重叠的复杂实体关系，从而增加了信息提取的难度。为了弥补临床医学关系提取数据集的缺失，本研究利用某医院 584 名患者的电子病历创建了一个紧凑的临床医学关系提取数据集。为了解决流水线式关系提取模型在实体和关系之间一对多关联问题上的局限性，本文引入了级联式关系提取模型。该模型集成了 MacBERT 预训练模型、门控递归网络和多头自注意机制，以增强文本特征的提取。同时，该模型还加入了对抗学习，以增强其鲁棒性。在涉及实体间一对多关系的场景中，采用了两阶段任务。首先预测主要实体，然后预测相关对象及其对应关系。采用这种级联结构的方法使模型能够灵活地管理错综复杂的实体关系，从而提高提取的准确性。实验结果证明了该模型的高效性，在DuIE、CHIP-CDEE和私人数据集上完成关系提取要求和任务的F1分数分别为82.8%、76.8%和88.2%。与基准模型相比，这些分数都有所提高。研究结果表明，该模型适用于实际领域，尤其是生物医学信息提取等任务。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Computers in biology and medicine 工程技术-工程：生物医学

CiteScore

11.70

自引率

10.40%

发文量

1086

审稿时长

74 days

期刊介绍： Computers in Biology and Medicine is an international forum for sharing groundbreaking advancements in the use of computers in bioscience and medicine. This journal serves as a medium for communicating essential research, instruction, ideas, and information regarding the rapidly evolving field of computer applications in these domains. By encouraging the exchange of knowledge, we aim to facilitate progress and innovation in the utilization of computers in biology and medicine.