Adaptive Integration of Categorical and Multi-relational Ontologies with EHR Data for Medical Concept Embedding

IF 6.6 4区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

ACM Transactions on Intelligent Systems and Technology Pub Date : 2023-11-14 DOI:10.1145/3625224

Chin Wang Cheong, Kejing Yin, William K. Cheung, Benjamin C. M. Fung, Jonathan Poon

{"title":"Adaptive Integration of Categorical and Multi-relational Ontologies with EHR Data for Medical Concept Embedding","authors":"Chin Wang Cheong, Kejing Yin, William K. Cheung, Benjamin C. M. Fung, Jonathan Poon","doi":"10.1145/3625224","DOIUrl":null,"url":null,"abstract":"Representation learning has been applied to Electronic Health Records (EHR) for medical concept embedding and the downstream predictive analytics tasks with promising results. Medical ontologies can also be integrated to guide the learning so the embedding space can better align with existing medical knowledge. Yet, properly carrying out the integration is non-trivial. Medical concepts that are similar according to a medical ontology may not be necessarily close in the embedding space learned from the EHR data, as medical ontologies organize medical concepts for their own specific objectives. Any integration methodology without considering the underlying inconsistency will result in sub-optimal medical concept embedding and, in turn, degrade the performance of the downstream tasks. In this article, we propose a novel representation learning framework called ADORE (ADaptive Ontological REpresentations) that allows the medical ontologies to adapt their structures for more robust integrating with the EHR data. ADORE first learns multiple embeddings for each category in the ontology via an attention mechanism. At the same time, it supports an adaptive integration of categorical and multi-relational ontologies in the embedding space using a category-aware graph attention network. We evaluate the performance of ADORE on a number of predictive analytics tasks using two EHR datasets. Our experimental results show that the medical concept embeddings obtained by ADORE can outperform the state-of-the-art methods for all the tasks. More importantly, it can result in clinically meaningful sub-categorization of the existing ontological categories and yield attention values that can further enhance the model interpretability.","PeriodicalId":48967,"journal":{"name":"ACM Transactions on Intelligent Systems and Technology","volume":"33 1","pages":""},"PeriodicalIF":6.6000,"publicationDate":"2023-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Intelligent Systems and Technology","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1145/3625224","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Representation learning has been applied to Electronic Health Records (EHR) for medical concept embedding and the downstream predictive analytics tasks with promising results. Medical ontologies can also be integrated to guide the learning so the embedding space can better align with existing medical knowledge. Yet, properly carrying out the integration is non-trivial. Medical concepts that are similar according to a medical ontology may not be necessarily close in the embedding space learned from the EHR data, as medical ontologies organize medical concepts for their own specific objectives. Any integration methodology without considering the underlying inconsistency will result in sub-optimal medical concept embedding and, in turn, degrade the performance of the downstream tasks. In this article, we propose a novel representation learning framework called ADORE (ADaptive Ontological REpresentations) that allows the medical ontologies to adapt their structures for more robust integrating with the EHR data. ADORE first learns multiple embeddings for each category in the ontology via an attention mechanism. At the same time, it supports an adaptive integration of categorical and multi-relational ontologies in the embedding space using a category-aware graph attention network. We evaluate the performance of ADORE on a number of predictive analytics tasks using two EHR datasets. Our experimental results show that the medical concept embeddings obtained by ADORE can outperform the state-of-the-art methods for all the tasks. More importantly, it can result in clinically meaningful sub-categorization of the existing ontological categories and yield attention values that can further enhance the model interpretability.

查看原文本刊更多论文

用于医学概念嵌入的分类本体和多关系本体与电子病历数据的自适应集成

表示学习已被应用于电子健康记录(EHR)的医学概念嵌入和下游预测分析任务，并取得了良好的效果。还可以集成医学本体来指导学习，以便嵌入空间能够更好地与现有医学知识保持一致。然而，正确地进行积分是非常重要的。根据医学本体，相似的医学概念在从EHR数据中学习的嵌入空间中不一定是接近的，因为医学本体根据自己的特定目标组织医学概念。任何不考虑潜在不一致性的集成方法都将导致次优的医学概念嵌入，进而降低下游任务的性能。在本文中，我们提出了一种新的表征学习框架，称为ADORE(自适应本体论表征)，它允许医学本体论调整其结构，以便与电子病历数据更健壮地集成。ADORE首先通过注意机制学习本体中每个类别的多个嵌入。同时，它支持在嵌入空间中使用类别感知图关注网络自适应集成分类本体和多关系本体。我们使用两个EHR数据集评估ADORE在许多预测分析任务上的性能。我们的实验结果表明，ADORE获得的医学概念嵌入在所有任务上都优于目前最先进的方法。更重要的是，它可以对现有的本体论类别进行临床有意义的子分类，并产生可进一步增强模型可解释性的注意值。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

ACM Transactions on Intelligent Systems and Technology COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE-COMPUTER SCIENCE, INFORMATION SYSTEMS

CiteScore

9.30

自引率

2.00%

发文量

131

期刊介绍： ACM Transactions on Intelligent Systems and Technology is a scholarly journal that publishes the highest quality papers on intelligent systems, applicable algorithms and technology with a multi-disciplinary perspective. An intelligent system is one that uses artificial intelligence (AI) techniques to offer important services (e.g., as a component of a larger system) to allow integrated systems to perceive, reason, learn, and act intelligently in the real world. ACM TIST is published quarterly (six issues a year). Each issue has 8-11 regular papers, with around 20 published journal pages or 10,000 words per paper. Additional references, proofs, graphs or detailed experiment results can be submitted as a separate appendix, while excessively lengthy papers will be rejected automatically. Authors can include online-only appendices for additional content of their published papers and are encouraged to share their code and/or data with other readers.