MixEHR-SurG:从电子健康记录中推断死亡率相关主题的联合比例危险和引导主题模型

IF 4 2区 医学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS
Yixuan Li , Archer Y. Yang , Ariane Marelli , Yue Li
{"title":"MixEHR-SurG:从电子健康记录中推断死亡率相关主题的联合比例危险和引导主题模型","authors":"Yixuan Li ,&nbsp;Archer Y. Yang ,&nbsp;Ariane Marelli ,&nbsp;Yue Li","doi":"10.1016/j.jbi.2024.104638","DOIUrl":null,"url":null,"abstract":"<div><p>Survival models can help medical practitioners to evaluate the prognostic importance of clinical variables to patient outcomes such as mortality or hospital readmission and subsequently design personalized treatment regimes. Electronic Health Records (EHRs) hold the promise for large-scale survival analysis based on systematically recorded clinical features for each patient. However, existing survival models either do not scale to high dimensional and multi-modal EHR data or are difficult to interpret. In this study, we present a supervised topic model called MixEHR-SurG to simultaneously integrate heterogeneous EHR data and model survival hazard. Our contributions are three-folds: (1) integrating EHR topic inference with Cox proportional hazards likelihood; (2) integrating patient-specific topic hyperparameters using the PheCode concepts such that each topic can be identified with exactly one PheCode-associated phenotype; (3) multi-modal survival topic inference. This leads to a highly interpretable survival topic model that can infer PheCode-specific phenotype topics associated with patient mortality. We evaluated MixEHR-SurG using a simulated dataset and two real-world EHR datasets: the Quebec Congenital Heart Disease (CHD) data consisting of 8211 subjects with 75,187 outpatient claim records of 1767 unique ICD codes; the MIMIC-III consisting of 1458 subjects with multi-modal EHR records. Compared to the baselines, MixEHR-SurG achieved a superior dynamic AUROC for mortality prediction, with a mean AUROC score of 0.89 in the simulation dataset and a mean AUROC of 0.645 on the CHD dataset. Qualitatively, MixEHR-SurG associates severe cardiac conditions with high mortality risk among the CHD patients after the first heart failure hospitalization and critical brain injuries with increased mortality among the MIMIC-III patients after their ICU discharge. Together, the integration of the Cox proportional hazards model and EHR topic inference in MixEHR-SurG not only leads to competitive mortality prediction but also meaningful phenotype topics for in-depth survival analysis. The software is available at GitHub: <span>https://github.com/li-lab-mcgill/MixEHR-SurG</span><svg><path></path></svg>.</p></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":null,"pages":null},"PeriodicalIF":4.0000,"publicationDate":"2024-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S153204642400056X/pdfft?md5=2e00f7bfa6e4631f19ffcdcf5d0e1985&pid=1-s2.0-S153204642400056X-main.pdf","citationCount":"0","resultStr":"{\"title\":\"MixEHR-SurG: A joint proportional hazard and guided topic model for inferring mortality-associated topics from electronic health records\",\"authors\":\"Yixuan Li ,&nbsp;Archer Y. Yang ,&nbsp;Ariane Marelli ,&nbsp;Yue Li\",\"doi\":\"10.1016/j.jbi.2024.104638\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Survival models can help medical practitioners to evaluate the prognostic importance of clinical variables to patient outcomes such as mortality or hospital readmission and subsequently design personalized treatment regimes. Electronic Health Records (EHRs) hold the promise for large-scale survival analysis based on systematically recorded clinical features for each patient. However, existing survival models either do not scale to high dimensional and multi-modal EHR data or are difficult to interpret. In this study, we present a supervised topic model called MixEHR-SurG to simultaneously integrate heterogeneous EHR data and model survival hazard. Our contributions are three-folds: (1) integrating EHR topic inference with Cox proportional hazards likelihood; (2) integrating patient-specific topic hyperparameters using the PheCode concepts such that each topic can be identified with exactly one PheCode-associated phenotype; (3) multi-modal survival topic inference. This leads to a highly interpretable survival topic model that can infer PheCode-specific phenotype topics associated with patient mortality. We evaluated MixEHR-SurG using a simulated dataset and two real-world EHR datasets: the Quebec Congenital Heart Disease (CHD) data consisting of 8211 subjects with 75,187 outpatient claim records of 1767 unique ICD codes; the MIMIC-III consisting of 1458 subjects with multi-modal EHR records. Compared to the baselines, MixEHR-SurG achieved a superior dynamic AUROC for mortality prediction, with a mean AUROC score of 0.89 in the simulation dataset and a mean AUROC of 0.645 on the CHD dataset. Qualitatively, MixEHR-SurG associates severe cardiac conditions with high mortality risk among the CHD patients after the first heart failure hospitalization and critical brain injuries with increased mortality among the MIMIC-III patients after their ICU discharge. Together, the integration of the Cox proportional hazards model and EHR topic inference in MixEHR-SurG not only leads to competitive mortality prediction but also meaningful phenotype topics for in-depth survival analysis. The software is available at GitHub: <span>https://github.com/li-lab-mcgill/MixEHR-SurG</span><svg><path></path></svg>.</p></div>\",\"PeriodicalId\":15263,\"journal\":{\"name\":\"Journal of Biomedical Informatics\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":4.0000,\"publicationDate\":\"2024-04-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.sciencedirect.com/science/article/pii/S153204642400056X/pdfft?md5=2e00f7bfa6e4631f19ffcdcf5d0e1985&pid=1-s2.0-S153204642400056X-main.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Biomedical Informatics\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S153204642400056X\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Biomedical Informatics","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S153204642400056X","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0

摘要

生存模型可以帮助医疗从业人员评估临床变量对病人预后(如死亡率或再入院率)的重要性,进而设计个性化的治疗方案。电子健康记录(EHR)可根据系统记录的每位患者的临床特征进行大规模生存分析。然而,现有的生存模型要么无法扩展到高维和多模态的电子病历数据,要么难以解释。在本研究中,我们提出了一种名为 MixEHR-SurG 的有监督主题模型,可同时整合异构电子病历数据并建立生存危险模型。我们的贡献体现在三个方面:(1)将 EHR 主题推断与 Cox 比例危险似然法相结合;(2)使用 PheCode 概念整合患者特定的主题超参数,从而使每个主题都能与精确的一种 PheCode 相关表型相识别;(3)多模态生存主题推断。这将产生一个高度可解释的生存主题模型,该模型可以推断出与患者死亡率相关的特定 PheCode 表型主题。我们使用一个模拟数据集和两个现实世界的电子病历数据集对 MixEHR-SurG 进行了评估:魁北克先天性心脏病(CHD)数据由 8211 个受试者组成,包含 75,187 条门诊索赔记录和 1767 个独特的 ICD 编码;MIMIC-III 由 1458 个受试者组成,包含多种模式的电子病历记录。与基线相比,MixEHR-SurG 在死亡率预测方面实现了更优越的动态 AUROC,模拟数据集的平均 AUROC 得分为 0.89,CHD 数据集的平均 AUROC 得分为 0.645。从定性角度看,MixEHR-SurG 将严重心脏疾病与首次心衰住院后 CHD 患者的高死亡率风险联系在一起,将重症脑损伤与 MIMIC-III 患者出院后 ICU 死亡率增加联系在一起。MixEHR-SurG将Cox比例危险模型和电子病历主题推断结合在一起,不仅能预测有竞争力的死亡率,还能为深入的生存分析提供有意义的表型主题。该软件可在 GitHub 上获取:https://github.com/li-lab-mcgill/MixEHR-SurG。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

MixEHR-SurG: A joint proportional hazard and guided topic model for inferring mortality-associated topics from electronic health records

MixEHR-SurG: A joint proportional hazard and guided topic model for inferring mortality-associated topics from electronic health records

Survival models can help medical practitioners to evaluate the prognostic importance of clinical variables to patient outcomes such as mortality or hospital readmission and subsequently design personalized treatment regimes. Electronic Health Records (EHRs) hold the promise for large-scale survival analysis based on systematically recorded clinical features for each patient. However, existing survival models either do not scale to high dimensional and multi-modal EHR data or are difficult to interpret. In this study, we present a supervised topic model called MixEHR-SurG to simultaneously integrate heterogeneous EHR data and model survival hazard. Our contributions are three-folds: (1) integrating EHR topic inference with Cox proportional hazards likelihood; (2) integrating patient-specific topic hyperparameters using the PheCode concepts such that each topic can be identified with exactly one PheCode-associated phenotype; (3) multi-modal survival topic inference. This leads to a highly interpretable survival topic model that can infer PheCode-specific phenotype topics associated with patient mortality. We evaluated MixEHR-SurG using a simulated dataset and two real-world EHR datasets: the Quebec Congenital Heart Disease (CHD) data consisting of 8211 subjects with 75,187 outpatient claim records of 1767 unique ICD codes; the MIMIC-III consisting of 1458 subjects with multi-modal EHR records. Compared to the baselines, MixEHR-SurG achieved a superior dynamic AUROC for mortality prediction, with a mean AUROC score of 0.89 in the simulation dataset and a mean AUROC of 0.645 on the CHD dataset. Qualitatively, MixEHR-SurG associates severe cardiac conditions with high mortality risk among the CHD patients after the first heart failure hospitalization and critical brain injuries with increased mortality among the MIMIC-III patients after their ICU discharge. Together, the integration of the Cox proportional hazards model and EHR topic inference in MixEHR-SurG not only leads to competitive mortality prediction but also meaningful phenotype topics for in-depth survival analysis. The software is available at GitHub: https://github.com/li-lab-mcgill/MixEHR-SurG.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Journal of Biomedical Informatics
Journal of Biomedical Informatics 医学-计算机:跨学科应用
CiteScore
8.90
自引率
6.70%
发文量
243
审稿时长
32 days
期刊介绍: The Journal of Biomedical Informatics reflects a commitment to high-quality original research papers, reviews, and commentaries in the area of biomedical informatics methodology. Although we publish articles motivated by applications in the biomedical sciences (for example, clinical medicine, health care, population health, and translational bioinformatics), the journal emphasizes reports of new methodologies and techniques that have general applicability and that form the basis for the evolving science of biomedical informatics. Articles on medical devices; evaluations of implemented systems (including clinical trials of information technologies); or papers that provide insight into a biological process, a specific disease, or treatment options would generally be more suitable for publication in other venues. Papers on applications of signal processing and image analysis are often more suitable for biomedical engineering journals or other informatics journals, although we do publish papers that emphasize the information management and knowledge representation/modeling issues that arise in the storage and use of biological signals and images. System descriptions are welcome if they illustrate and substantiate the underlying methodology that is the principal focus of the report and an effort is made to address the generalizability and/or range of application of that methodology. Note also that, given the international nature of JBI, papers that deal with specific languages other than English, or with country-specific health systems or approaches, are acceptable for JBI only if they offer generalizable lessons that are relevant to the broad JBI readership, regardless of their country, language, culture, or health system.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信