Extending electronic medical records vector models with knowledge graphs to improve hospitalization prediction.

IF 2 3区工程技术 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Journal of Biomedical Semantics Pub Date : 2022-02-22 DOI:10.1186/s13326-022-00261-9

Raphaël Gazzotti, Catherine Faron, Fabien Gandon, Virginie Lacroix-Hugues, David Darmon

{"title":"Extending electronic medical records vector models with knowledge graphs to improve hospitalization prediction.","authors":"Raphaël Gazzotti, Catherine Faron, Fabien Gandon, Virginie Lacroix-Hugues, David Darmon","doi":"10.1186/s13326-022-00261-9","DOIUrl":null,"url":null,"abstract":"Background: Artificial intelligence methods applied to electronic medical records (EMRs) hold the potential to help physicians save time by sharpening their analysis and decisions, thereby improving the health of patients. On the one hand, machine learning algorithms have proven their effectiveness in extracting information and exploiting knowledge extracted from data. On the other hand, knowledge graphs capture human knowledge by relying on conceptual schemas and formalization and supporting reasoning. Leveraging knowledge graphs that are legion in the medical field, it is possible to pre-process and enrich data representation used by machine learning algorithms. Medical data standardization is an opportunity to jointly exploit the richness of knowledge graphs and the capabilities of machine learning algorithms.Methods: We propose to address the problem of hospitalization prediction for patients with an approach that enriches vector representation of EMRs with information extracted from different knowledge graphs before learning and predicting. In addition, we performed an automatic selection of features resulting from knowledge graphs to distinguish noisy ones from those that can benefit the decision making. We report the results of our experiments on the PRIMEGE PACA database that contains more than 600,000 consultations carried out by 17 general practitioners (GPs).Results: A statistical evaluation shows that our proposed approach improves hospitalization prediction. More precisely, injecting features extracted from cross-domain knowledge graphs in the vector representation of EMRs given as input to the prediction algorithm significantly increases the F1 score of the prediction.Conclusions: By injecting knowledge from recognized reference sources into the representation of EMRs, it is possible to significantly improve the prediction of medical events. Future work would be to evaluate the impact of a feature selection step coupled with a combination of features extracted from several knowledge graphs. A possible avenue is to study more hierarchical levels and properties related to concepts, as well as to integrate more semantic annotators to exploit unstructured data.","PeriodicalId":15055,"journal":{"name":"Journal of Biomedical Semantics","volume":" ","pages":"6"},"PeriodicalIF":2.0000,"publicationDate":"2022-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8861628/pdf/","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Biomedical Semantics","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.1186/s13326-022-00261-9","RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}

引用次数: 1

Abstract

Background: Artificial intelligence methods applied to electronic medical records (EMRs) hold the potential to help physicians save time by sharpening their analysis and decisions, thereby improving the health of patients. On the one hand, machine learning algorithms have proven their effectiveness in extracting information and exploiting knowledge extracted from data. On the other hand, knowledge graphs capture human knowledge by relying on conceptual schemas and formalization and supporting reasoning. Leveraging knowledge graphs that are legion in the medical field, it is possible to pre-process and enrich data representation used by machine learning algorithms. Medical data standardization is an opportunity to jointly exploit the richness of knowledge graphs and the capabilities of machine learning algorithms.

Methods: We propose to address the problem of hospitalization prediction for patients with an approach that enriches vector representation of EMRs with information extracted from different knowledge graphs before learning and predicting. In addition, we performed an automatic selection of features resulting from knowledge graphs to distinguish noisy ones from those that can benefit the decision making. We report the results of our experiments on the PRIMEGE PACA database that contains more than 600,000 consultations carried out by 17 general practitioners (GPs).

Results: A statistical evaluation shows that our proposed approach improves hospitalization prediction. More precisely, injecting features extracted from cross-domain knowledge graphs in the vector representation of EMRs given as input to the prediction algorithm significantly increases the F1 score of the prediction.

Conclusions: By injecting knowledge from recognized reference sources into the representation of EMRs, it is possible to significantly improve the prediction of medical events. Future work would be to evaluate the impact of a feature selection step coupled with a combination of features extracted from several knowledge graphs. A possible avenue is to study more hierarchical levels and properties related to concepts, as well as to integrate more semantic annotators to exploit unstructured data.

Abstract Image

查看原文本刊更多论文

使用知识图扩展电子医疗记录向量模型，以改进住院预测。

背景:应用于电子病历(emr)的人工智能方法有可能帮助医生通过加强分析和决策来节省时间，从而改善患者的健康状况。一方面，机器学习算法已经证明了它们在提取信息和利用从数据中提取的知识方面的有效性。另一方面，知识图通过依赖概念图式和形式化以及支持推理来捕获人类知识。利用医学领域大量的知识图谱，可以预处理和丰富机器学习算法使用的数据表示。医疗数据标准化是一个机会，可以共同利用知识图谱的丰富性和机器学习算法的能力。方法:我们提出了一种方法，通过在学习和预测之前从不同的知识图中提取信息来丰富电子病历的向量表示，以解决患者住院预测问题。此外，我们执行了从知识图中产生的特征的自动选择，以区分嘈杂的特征和有利于决策的特征。我们报告了我们在PRIMEGE PACA数据库上的实验结果，该数据库包含17名全科医生(gp)进行的60多万次咨询。结果:统计评估表明，我们提出的方法提高了住院预测。更准确地说，将从跨领域知识图中提取的特征注入到作为预测算法输入的emr向量表示中，可以显著提高预测的F1分数。结论:通过将来自公认参考来源的知识注入到电子病历的表示中，可以显著提高医疗事件的预测。未来的工作将是评估特征选择步骤与从几个知识图中提取的特征组合的影响。一种可能的方法是研究与概念相关的更多层次和属性，以及集成更多的语义注释器来利用非结构化数据。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Biomedical Semantics MATHEMATICAL & COMPUTATIONAL BIOLOGY-

CiteScore

4.20

自引率

5.30%

发文量

审稿时长

30 weeks

期刊介绍： Journal of Biomedical Semantics addresses issues of semantic enrichment and semantic processing in the biomedical domain. The scope of the journal covers two main areas: Infrastructure for biomedical semantics: focusing on semantic resources and repositories, meta-data management and resource description, knowledge representation and semantic frameworks, the Biomedical Semantic Web, and semantic interoperability. Semantic mining, annotation, and analysis: focusing on approaches and applications of semantic resources; and tools for investigation, reasoning, prediction, and discoveries in biomedicine.