Dean T Eurich, Darren Lau, Weiting Li, Olivia Weaver, Tanya Joon, Ming Ye, Finlay A McAlister, Padma Kaul, Salim Samanani
{"title":"预测成年糖尿病患者感染COVID-19的风险:一种机器学习方法。","authors":"Dean T Eurich, Darren Lau, Weiting Li, Olivia Weaver, Tanya Joon, Ming Ye, Finlay A McAlister, Padma Kaul, Salim Samanani","doi":"10.1016/j.jcjd.2025.09.001","DOIUrl":null,"url":null,"abstract":"<p><strong>Objectives: </strong>To develop a machine learning model that accurately predicts the risk of acquiring COVID-19 in community-dwelling adults with type 1 and/or type 2 diabetes in Alberta, Canada.</p><p><strong>Methods: </strong>This predictive supervised machine learning study included adults (>=18 years old) living in Alberta, Canada between April 1<sup>st</sup> 2019-March 31<sup>st</sup> 2021 with pre-existing diabetes (n=372,055, excluding n=2,541 due to migration; final sample size=369,514). The outcome of interest was a positive SARS-CoV-2 PCR test result between March 1st, 2020, and March 1st, 2021. Model features were extracted from routinely collected Alberta administrative health data from March 1<sup>st</sup> 2015 to March 1<sup>st</sup> 2020. Fifteen algorithms were trained on 67% of the data and the top performer (Light Gradient Boost Model, LGBoost) was validated on the remaining 33%. The model was calibrated, and model performance assessed using area under the receiver operating characteristic curve (AUROC), area under the precision recall curve (AUPRC) and threshold analyses.</p><p><strong>Results: </strong>Among 369,514 individuals with diabetes, 140,511 were tested of whom 13,082 had a positive SARS-CoV-2 test. The LGBoost model incorporated 367 features with AUROC and AUPRC of 0.69 and 0.08 respectively. The model was well-calibrated for common risk thresholds (<0.2 probability) with high specificity (>=0.98 at all thresholds), however sensitivity and positive predictive values were low at all thresholds (<=0.08 and <=0.18 respectively).</p><p><strong>Conclusions: </strong>The LGBoost model lacked the sensitivity to be clinically useful in predicting SARS-CoV-2 infection in Albertans with diabetes. Alternative data sources may be required to improve future COVID-19 prediction models from the community.</p>","PeriodicalId":93918,"journal":{"name":"Canadian journal of diabetes","volume":" ","pages":""},"PeriodicalIF":2.6000,"publicationDate":"2025-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Predicting the risk of COVID-19 among adult patients with diabetes: A machine learning approach.\",\"authors\":\"Dean T Eurich, Darren Lau, Weiting Li, Olivia Weaver, Tanya Joon, Ming Ye, Finlay A McAlister, Padma Kaul, Salim Samanani\",\"doi\":\"10.1016/j.jcjd.2025.09.001\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Objectives: </strong>To develop a machine learning model that accurately predicts the risk of acquiring COVID-19 in community-dwelling adults with type 1 and/or type 2 diabetes in Alberta, Canada.</p><p><strong>Methods: </strong>This predictive supervised machine learning study included adults (>=18 years old) living in Alberta, Canada between April 1<sup>st</sup> 2019-March 31<sup>st</sup> 2021 with pre-existing diabetes (n=372,055, excluding n=2,541 due to migration; final sample size=369,514). The outcome of interest was a positive SARS-CoV-2 PCR test result between March 1st, 2020, and March 1st, 2021. Model features were extracted from routinely collected Alberta administrative health data from March 1<sup>st</sup> 2015 to March 1<sup>st</sup> 2020. Fifteen algorithms were trained on 67% of the data and the top performer (Light Gradient Boost Model, LGBoost) was validated on the remaining 33%. The model was calibrated, and model performance assessed using area under the receiver operating characteristic curve (AUROC), area under the precision recall curve (AUPRC) and threshold analyses.</p><p><strong>Results: </strong>Among 369,514 individuals with diabetes, 140,511 were tested of whom 13,082 had a positive SARS-CoV-2 test. The LGBoost model incorporated 367 features with AUROC and AUPRC of 0.69 and 0.08 respectively. The model was well-calibrated for common risk thresholds (<0.2 probability) with high specificity (>=0.98 at all thresholds), however sensitivity and positive predictive values were low at all thresholds (<=0.08 and <=0.18 respectively).</p><p><strong>Conclusions: </strong>The LGBoost model lacked the sensitivity to be clinically useful in predicting SARS-CoV-2 infection in Albertans with diabetes. Alternative data sources may be required to improve future COVID-19 prediction models from the community.</p>\",\"PeriodicalId\":93918,\"journal\":{\"name\":\"Canadian journal of diabetes\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":2.6000,\"publicationDate\":\"2025-09-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Canadian journal of diabetes\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1016/j.jcjd.2025.09.001\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Canadian journal of diabetes","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1016/j.jcjd.2025.09.001","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Predicting the risk of COVID-19 among adult patients with diabetes: A machine learning approach.
Objectives: To develop a machine learning model that accurately predicts the risk of acquiring COVID-19 in community-dwelling adults with type 1 and/or type 2 diabetes in Alberta, Canada.
Methods: This predictive supervised machine learning study included adults (>=18 years old) living in Alberta, Canada between April 1st 2019-March 31st 2021 with pre-existing diabetes (n=372,055, excluding n=2,541 due to migration; final sample size=369,514). The outcome of interest was a positive SARS-CoV-2 PCR test result between March 1st, 2020, and March 1st, 2021. Model features were extracted from routinely collected Alberta administrative health data from March 1st 2015 to March 1st 2020. Fifteen algorithms were trained on 67% of the data and the top performer (Light Gradient Boost Model, LGBoost) was validated on the remaining 33%. The model was calibrated, and model performance assessed using area under the receiver operating characteristic curve (AUROC), area under the precision recall curve (AUPRC) and threshold analyses.
Results: Among 369,514 individuals with diabetes, 140,511 were tested of whom 13,082 had a positive SARS-CoV-2 test. The LGBoost model incorporated 367 features with AUROC and AUPRC of 0.69 and 0.08 respectively. The model was well-calibrated for common risk thresholds (<0.2 probability) with high specificity (>=0.98 at all thresholds), however sensitivity and positive predictive values were low at all thresholds (<=0.08 and <=0.18 respectively).
Conclusions: The LGBoost model lacked the sensitivity to be clinically useful in predicting SARS-CoV-2 infection in Albertans with diabetes. Alternative data sources may be required to improve future COVID-19 prediction models from the community.