Grant B. Morgan , Andreas Stamatis , Chelsea C. Yager , Ali Boolani
{"title":"Informatics-driven unsupervised learning of comorbidity clusters for COVID-19 reinfection risk: A finite mixture modeling approach","authors":"Grant B. Morgan , Andreas Stamatis , Chelsea C. Yager , Ali Boolani","doi":"10.1016/j.imu.2025.101649","DOIUrl":null,"url":null,"abstract":"<div><h3>Purpose</h3><div>This study applied an informatics-focused, unsupervised learning framework (finite mixture modeling) to determine whether distinct clusters of coexisting conditions among patients with coronavirus disease 2019 (COVID-19) are associated with multiple (reinfection) versus single infections.</div></div><div><h3>Methods</h3><div>We analyzed 42,974 patient records containing COVID-19 diagnoses using an machine learning classification algorithm to identify comorbidity profiles. Of nearly 850 recorded conditions, 29 were retained if they occurred in at least 5 % of the sample. We then compared patients with single versus multiple COVID-19 diagnoses within each profile.</div></div><div><h3>Results</h3><div>Three comorbidity profiles emerged. The first profile (Minimal Comorbidity) was the largest (67 % of sample) and was characterized by few additional conditions. Patients classified into this profile were also 20–30 years younger, on average, than members of the other profiles. The second (Elevated Select Comorbidity) profile consisted of 24 % of the sample and was characterized by moderate-risk factors such as hypertension, hyperlipidemia, and acute respiratory failure. The third (High Comorbidity Burden) third was represented by 9 % of the sample and was characterized by conditions related to cardiovascular, renal, endocrine, and respiratory systems. Among the high-burden group, 30 % experienced reinfection, versus only 9 % in the minimal group. Overall, patients with more extensive cardiometabolic or pulmonary conditions were more likely to experience repeated infection.</div></div><div><h3>Conclusions</h3><div>By identifying and characterizing comorbidity clusters, this informatics-based approach offers deeper insight into COVID-19 reinfection dynamics. The findings may support targeted prevention, data-driven resource allocation, and precision medicine strategies by highlighting subgroups at elevated risk. Moreover, the unsupervised modeling framework is potentially adaptable to other multifactorial conditions, underscoring its broader utility in medical informatics.</div></div>","PeriodicalId":13953,"journal":{"name":"Informatics in Medicine Unlocked","volume":"55 ","pages":"Article 101649"},"PeriodicalIF":0.0000,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Informatics in Medicine Unlocked","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2352914825000371","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"Medicine","Score":null,"Total":0}
引用次数: 0
Abstract
Purpose
This study applied an informatics-focused, unsupervised learning framework (finite mixture modeling) to determine whether distinct clusters of coexisting conditions among patients with coronavirus disease 2019 (COVID-19) are associated with multiple (reinfection) versus single infections.
Methods
We analyzed 42,974 patient records containing COVID-19 diagnoses using an machine learning classification algorithm to identify comorbidity profiles. Of nearly 850 recorded conditions, 29 were retained if they occurred in at least 5 % of the sample. We then compared patients with single versus multiple COVID-19 diagnoses within each profile.
Results
Three comorbidity profiles emerged. The first profile (Minimal Comorbidity) was the largest (67 % of sample) and was characterized by few additional conditions. Patients classified into this profile were also 20–30 years younger, on average, than members of the other profiles. The second (Elevated Select Comorbidity) profile consisted of 24 % of the sample and was characterized by moderate-risk factors such as hypertension, hyperlipidemia, and acute respiratory failure. The third (High Comorbidity Burden) third was represented by 9 % of the sample and was characterized by conditions related to cardiovascular, renal, endocrine, and respiratory systems. Among the high-burden group, 30 % experienced reinfection, versus only 9 % in the minimal group. Overall, patients with more extensive cardiometabolic or pulmonary conditions were more likely to experience repeated infection.
Conclusions
By identifying and characterizing comorbidity clusters, this informatics-based approach offers deeper insight into COVID-19 reinfection dynamics. The findings may support targeted prevention, data-driven resource allocation, and precision medicine strategies by highlighting subgroups at elevated risk. Moreover, the unsupervised modeling framework is potentially adaptable to other multifactorial conditions, underscoring its broader utility in medical informatics.
期刊介绍:
Informatics in Medicine Unlocked (IMU) is an international gold open access journal covering a broad spectrum of topics within medical informatics, including (but not limited to) papers focusing on imaging, pathology, teledermatology, public health, ophthalmological, nursing and translational medicine informatics. The full papers that are published in the journal are accessible to all who visit the website.