{"title":"A Knowledge-Guided Event-Relation Graph Learning Network for Patient Similarity With Chinese Electronic Medical Records","authors":"Zhichao Zhu;Jianqiang Li;Chun Xu;Jingchen Zou;Qing Zhao","doi":"10.1109/TBDATA.2024.3481955","DOIUrl":null,"url":null,"abstract":"Feature sparse problem is commonly existing in patient similarity calculation task with clinical data, to track which, some approaches have been proposed to use Graph Neural Network (GNN) to model the complex structural information in patient Electronic Medical Records (EMRs). These GNN based approaches usually treat medical concepts (i.e., symptoms, diseases) as nodes to learn spatial features and adopt Recurrent Neural Network (RNN) to learn temporal sequence of these concepts. However, in many cases, several sequential concepts contained in EMR text are considered as occur simultaneously in the clinical diagnosis (i.e., some symptoms are detected simultaneously by once test), learning temporal sequence of these sequential concepts might cause noise for patient similarity calculation. Furthermore, the limited discriminative capability of concepts cannot provide sufficient indicative information for similarity learning. To this end, we propose a Knowledge-guided Event-relation Graph Learning Network (KEGLN) for patient similarity calculation. Specifically, after event extraction, we first construct element-relation graphs and use the first Graph Convolutional Network (GCN) and Graph Attention Network (GAT) layer to aggregate features from each event and its involved elements for reducing the noise produced by temporal sequence of concepts. Meanwhile, the entity description and attribute-value structure are extracted to supplement background knowledge of elements (concepts and trigger words). For the updated event nodes, we then design a event-relation graph and adopt the second GCN and GAT layer to aggregate information from events and their directly neighbors to extract spatial features of events at the current moment. Finally, the Bidirectional Long Short-Term Memory (BiLSTM) model is adopted to learn temporal dependency of event nodes to capture the dynamic change of disease progress. Through diverse datasets and extensive experiments, our KEGLN model outperforms all baselines for Chinese patient similarity calculation.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"11 3","pages":"1475-1492"},"PeriodicalIF":7.5000,"publicationDate":"2024-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Big Data","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10720035/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Feature sparse problem is commonly existing in patient similarity calculation task with clinical data, to track which, some approaches have been proposed to use Graph Neural Network (GNN) to model the complex structural information in patient Electronic Medical Records (EMRs). These GNN based approaches usually treat medical concepts (i.e., symptoms, diseases) as nodes to learn spatial features and adopt Recurrent Neural Network (RNN) to learn temporal sequence of these concepts. However, in many cases, several sequential concepts contained in EMR text are considered as occur simultaneously in the clinical diagnosis (i.e., some symptoms are detected simultaneously by once test), learning temporal sequence of these sequential concepts might cause noise for patient similarity calculation. Furthermore, the limited discriminative capability of concepts cannot provide sufficient indicative information for similarity learning. To this end, we propose a Knowledge-guided Event-relation Graph Learning Network (KEGLN) for patient similarity calculation. Specifically, after event extraction, we first construct element-relation graphs and use the first Graph Convolutional Network (GCN) and Graph Attention Network (GAT) layer to aggregate features from each event and its involved elements for reducing the noise produced by temporal sequence of concepts. Meanwhile, the entity description and attribute-value structure are extracted to supplement background knowledge of elements (concepts and trigger words). For the updated event nodes, we then design a event-relation graph and adopt the second GCN and GAT layer to aggregate information from events and their directly neighbors to extract spatial features of events at the current moment. Finally, the Bidirectional Long Short-Term Memory (BiLSTM) model is adopted to learn temporal dependency of event nodes to capture the dynamic change of disease progress. Through diverse datasets and extensive experiments, our KEGLN model outperforms all baselines for Chinese patient similarity calculation.
摘要特征稀疏问题是临床数据患者相似度计算任务中普遍存在的问题,为了跟踪这一问题,提出了利用图神经网络(GNN)对患者电子病历(emr)中复杂的结构信息建模的方法。这些基于GNN的方法通常将医学概念(即症状、疾病)作为节点来学习空间特征,并采用递归神经网络(RNN)来学习这些概念的时间序列。然而,在许多情况下,EMR文本中包含的几个顺序概念被认为是在临床诊断中同时发生的(即,一些症状是通过一次测试同时检测到的),学习这些顺序概念的时间序列可能会对患者相似性计算产生噪声。此外,概念的有限判别能力不能为相似性学习提供足够的指示性信息。为此,我们提出了一种知识引导的事件关系图学习网络(KEGLN)用于患者相似性计算。具体而言,在事件提取之后,我们首先构建元素关系图,并使用第一层图卷积网络(GCN)和图注意网络(GAT)对每个事件及其相关元素的特征进行聚合,以降低概念时间序列产生的噪声。同时,提取实体描述和属性值结构,补充元素(概念和触发词)的背景知识。对于更新后的事件节点,我们设计了事件关系图,并采用第二层GCN和GAT层对事件及其直接相邻节点的信息进行聚合,提取事件在当前时刻的空间特征。最后,采用双向长短期记忆(Bidirectional Long - short - Memory, BiLSTM)模型学习事件节点的时间依赖性,捕捉疾病进展的动态变化。通过不同的数据集和大量的实验,我们的KEGLN模型在中国患者相似度计算方面优于所有基线。
期刊介绍:
The IEEE Transactions on Big Data publishes peer-reviewed articles focusing on big data. These articles present innovative research ideas and application results across disciplines, including novel theories, algorithms, and applications. Research areas cover a wide range, such as big data analytics, visualization, curation, management, semantics, infrastructure, standards, performance analysis, intelligence extraction, scientific discovery, security, privacy, and legal issues specific to big data. The journal also prioritizes applications of big data in fields generating massive datasets.