M. Shi, A. Yang, E. Lau, A. Luk, Ronald C W Ma, Alice P S Kong, Raymond S M Wong, Jones C M Chan, Juliana C N Chan, Elaine Chow
{"title":"基于电子健康记录的新型机器学习模型,用于预测导致老年糖尿病患者住院的严重低血糖症:全境队列和建模研究","authors":"M. Shi, A. Yang, E. Lau, A. Luk, Ronald C W Ma, Alice P S Kong, Raymond S M Wong, Jones C M Chan, Juliana C N Chan, Elaine Chow","doi":"10.1371/journal.pmed.1004369","DOIUrl":null,"url":null,"abstract":"Background Older adults with diabetes are at high risk of severe hypoglycemia (SH). Many machine-learning (ML) models predict short-term hypoglycemia are not specific for older adults and show poor precision-recall. We aimed to develop a multidimensional, electronic health record (EHR)-based ML model to predict one-year risk of SH requiring hospitalization in older adults with diabetes. Methods and findings We adopted a case-control design for a retrospective territory-wide cohort of 1,456,618 records from 364,863 unique older adults (age ≥65 years) with diabetes and at least 1 Hong Kong Hospital Authority attendance from 2013 to 2018. We used 258 predictors including demographics, admissions, diagnoses, medications, and routine laboratory tests in a one-year period to predict SH events requiring hospitalization in the following 12 months. The cohort was randomly split into training, testing, and internal validation sets in a 7:2:1 ratio. Six ML algorithms were evaluated including logistic-regression, random forest, gradient boost machine, deep neural network (DNN), XGBoost, and Rulefit. We tested our model in a temporal validation cohort in the Hong Kong Diabetes Register with predictors defined in 2018 and outcome events defined in 2019. Predictive performance was assessed using area under the receiver operating characteristic curve (AUROC), area under the precision-recall curve (AUPRC) statistics, and positive predictive value (PPV). We identified 11,128 SH events requiring hospitalization during the observation periods. The XGBoost model yielded the best performance (AUROC = 0.978 [95% CI 0.972 to 0.984]; AUPRC = 0.670 [95% CI 0.652 to 0.688]; PPV = 0.721 [95% CI 0.703 to 0.739]). This was superior to an 11-variable conventional logistic-regression model comprised of age, sex, history of SH, hypertension, blood glucose, kidney function measurements, and use of oral glucose-lowering drugs (GLDs) (AUROC = 0.906; AUPRC = 0.085; PPV = 0.468). Top impactful predictors included non-use of lipid-regulating drugs, in-patient admission, urgent emergency triage, insulin use, and history of SH. External validation in the HKDR cohort yielded AUROC of 0.856 [95% CI 0.838 to 0.873]. Main limitations of this study included limited transportability of the model and lack of geographically independent validation. Conclusions Our novel-ML model demonstrated good discrimination and high precision in predicting one-year risk of SH requiring hospitalization. This may be integrated into EHR decision support systems for preemptive intervention in older adults at highest risk.","PeriodicalId":49008,"journal":{"name":"PLoS Medicine","volume":"178 3","pages":""},"PeriodicalIF":15.8000,"publicationDate":"2024-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A novel electronic health record-based, machine-learning model to predict severe hypoglycemia leading to hospitalizations in older adults with diabetes: A territory-wide cohort and modeling study\",\"authors\":\"M. Shi, A. Yang, E. Lau, A. Luk, Ronald C W Ma, Alice P S Kong, Raymond S M Wong, Jones C M Chan, Juliana C N Chan, Elaine Chow\",\"doi\":\"10.1371/journal.pmed.1004369\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Background Older adults with diabetes are at high risk of severe hypoglycemia (SH). Many machine-learning (ML) models predict short-term hypoglycemia are not specific for older adults and show poor precision-recall. We aimed to develop a multidimensional, electronic health record (EHR)-based ML model to predict one-year risk of SH requiring hospitalization in older adults with diabetes. Methods and findings We adopted a case-control design for a retrospective territory-wide cohort of 1,456,618 records from 364,863 unique older adults (age ≥65 years) with diabetes and at least 1 Hong Kong Hospital Authority attendance from 2013 to 2018. We used 258 predictors including demographics, admissions, diagnoses, medications, and routine laboratory tests in a one-year period to predict SH events requiring hospitalization in the following 12 months. The cohort was randomly split into training, testing, and internal validation sets in a 7:2:1 ratio. Six ML algorithms were evaluated including logistic-regression, random forest, gradient boost machine, deep neural network (DNN), XGBoost, and Rulefit. We tested our model in a temporal validation cohort in the Hong Kong Diabetes Register with predictors defined in 2018 and outcome events defined in 2019. Predictive performance was assessed using area under the receiver operating characteristic curve (AUROC), area under the precision-recall curve (AUPRC) statistics, and positive predictive value (PPV). We identified 11,128 SH events requiring hospitalization during the observation periods. The XGBoost model yielded the best performance (AUROC = 0.978 [95% CI 0.972 to 0.984]; AUPRC = 0.670 [95% CI 0.652 to 0.688]; PPV = 0.721 [95% CI 0.703 to 0.739]). This was superior to an 11-variable conventional logistic-regression model comprised of age, sex, history of SH, hypertension, blood glucose, kidney function measurements, and use of oral glucose-lowering drugs (GLDs) (AUROC = 0.906; AUPRC = 0.085; PPV = 0.468). Top impactful predictors included non-use of lipid-regulating drugs, in-patient admission, urgent emergency triage, insulin use, and history of SH. External validation in the HKDR cohort yielded AUROC of 0.856 [95% CI 0.838 to 0.873]. Main limitations of this study included limited transportability of the model and lack of geographically independent validation. Conclusions Our novel-ML model demonstrated good discrimination and high precision in predicting one-year risk of SH requiring hospitalization. This may be integrated into EHR decision support systems for preemptive intervention in older adults at highest risk.\",\"PeriodicalId\":49008,\"journal\":{\"name\":\"PLoS Medicine\",\"volume\":\"178 3\",\"pages\":\"\"},\"PeriodicalIF\":15.8000,\"publicationDate\":\"2024-04-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"PLoS Medicine\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1371/journal.pmed.1004369\",\"RegionNum\":1,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"Medicine\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"PLoS Medicine","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1371/journal.pmed.1004369","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"Medicine","Score":null,"Total":0}
引用次数: 0
摘要
背景 老年人糖尿病患者发生严重低血糖症(SH)的风险很高。许多预测短期低血糖症的机器学习(ML)模型对老年人并不具有特异性,其精确度和召回率也很低。我们的目标是开发一种基于电子健康记录(EHR)的多维 ML 模型,用于预测患有糖尿病的老年人一年内需要住院治疗的 SH 风险。方法和结果 我们采用了病例对照设计,从2013年到2018年,对364863名患有糖尿病且至少在香港医院管理局就诊过一次的老年人(年龄≥65岁)的145618份记录进行了全港范围的回顾性队列研究。我们使用了 258 个预测因子,包括一年内的人口统计学、入院情况、诊断、药物和常规实验室检查,来预测随后 12 个月内需要住院治疗的 SH 事件。队列按 7:2:1 的比例随机分为训练集、测试集和内部验证集。我们评估了六种 ML 算法,包括逻辑回归、随机森林、梯度提升机、深度神经网络 (DNN)、XGBoost 和 Rulefit。我们在香港糖尿病登记册的时间验证队列中测试了我们的模型,预测因子定义于 2018 年,结果事件定义于 2019 年。预测性能采用接收者操作特征曲线下面积(AUROC)、精确度-召回曲线下面积(AUPRC)统计和阳性预测值(PPV)进行评估。在观察期间,我们共发现了 11128 例需要住院治疗的 SH 事件。XGBoost 模型的性能最佳(AUROC = 0.978 [95% CI 0.972 to 0.984];AUPRC = 0.670 [95% CI 0.652 to 0.688];PPV = 0.721 [95% CI 0.703 to 0.739])。这优于由年龄、性别、SH 病史、高血压、血糖、肾功能测量值和口服降糖药 (GLD) 使用情况组成的 11 变量传统逻辑回归模型(AUROC = 0.906;AUPRC = 0.085;PPV = 0.468)。影响最大的预测因素包括未使用调脂药物、住院、紧急分诊、使用胰岛素和有 SH 病史。HKDR队列的外部验证得出的AUROC为0.856 [95% CI 0.838 to 0.873]。本研究的主要局限性包括模型的可移动性有限以及缺乏独立的地域验证。结论 我们的新型 ML 模型在预测需要住院治疗的 SH 一年期风险方面具有良好的区分度和较高的精确度。该模型可集成到电子病历决策支持系统中,对高风险老年人进行先期干预。
A novel electronic health record-based, machine-learning model to predict severe hypoglycemia leading to hospitalizations in older adults with diabetes: A territory-wide cohort and modeling study
Background Older adults with diabetes are at high risk of severe hypoglycemia (SH). Many machine-learning (ML) models predict short-term hypoglycemia are not specific for older adults and show poor precision-recall. We aimed to develop a multidimensional, electronic health record (EHR)-based ML model to predict one-year risk of SH requiring hospitalization in older adults with diabetes. Methods and findings We adopted a case-control design for a retrospective territory-wide cohort of 1,456,618 records from 364,863 unique older adults (age ≥65 years) with diabetes and at least 1 Hong Kong Hospital Authority attendance from 2013 to 2018. We used 258 predictors including demographics, admissions, diagnoses, medications, and routine laboratory tests in a one-year period to predict SH events requiring hospitalization in the following 12 months. The cohort was randomly split into training, testing, and internal validation sets in a 7:2:1 ratio. Six ML algorithms were evaluated including logistic-regression, random forest, gradient boost machine, deep neural network (DNN), XGBoost, and Rulefit. We tested our model in a temporal validation cohort in the Hong Kong Diabetes Register with predictors defined in 2018 and outcome events defined in 2019. Predictive performance was assessed using area under the receiver operating characteristic curve (AUROC), area under the precision-recall curve (AUPRC) statistics, and positive predictive value (PPV). We identified 11,128 SH events requiring hospitalization during the observation periods. The XGBoost model yielded the best performance (AUROC = 0.978 [95% CI 0.972 to 0.984]; AUPRC = 0.670 [95% CI 0.652 to 0.688]; PPV = 0.721 [95% CI 0.703 to 0.739]). This was superior to an 11-variable conventional logistic-regression model comprised of age, sex, history of SH, hypertension, blood glucose, kidney function measurements, and use of oral glucose-lowering drugs (GLDs) (AUROC = 0.906; AUPRC = 0.085; PPV = 0.468). Top impactful predictors included non-use of lipid-regulating drugs, in-patient admission, urgent emergency triage, insulin use, and history of SH. External validation in the HKDR cohort yielded AUROC of 0.856 [95% CI 0.838 to 0.873]. Main limitations of this study included limited transportability of the model and lack of geographically independent validation. Conclusions Our novel-ML model demonstrated good discrimination and high precision in predicting one-year risk of SH requiring hospitalization. This may be integrated into EHR decision support systems for preemptive intervention in older adults at highest risk.
期刊介绍:
PLOS Medicine is a prominent platform for discussing and researching global health challenges. The journal covers a wide range of topics, including biomedical, environmental, social, and political factors affecting health. It prioritizes articles that contribute to clinical practice, health policy, or a better understanding of pathophysiology, ultimately aiming to improve health outcomes across different settings.
The journal is unwavering in its commitment to uphold the highest ethical standards in medical publishing. This includes actively managing and disclosing any conflicts of interest related to reporting, reviewing, and publishing. PLOS Medicine promotes transparency in the entire review and publication process. The journal also encourages data sharing and encourages the reuse of published work. Additionally, authors retain copyright for their work, and the publication is made accessible through Open Access with no restrictions on availability and dissemination.
PLOS Medicine takes measures to avoid conflicts of interest associated with advertising drugs and medical devices or engaging in the exclusive sale of reprints.