Informatics-driven unsupervised learning of comorbidity clusters for COVID-19 reinfection risk: A finite mixture modeling approach

Q1 Medicine
Grant B. Morgan , Andreas Stamatis , Chelsea C. Yager , Ali Boolani
{"title":"Informatics-driven unsupervised learning of comorbidity clusters for COVID-19 reinfection risk: A finite mixture modeling approach","authors":"Grant B. Morgan ,&nbsp;Andreas Stamatis ,&nbsp;Chelsea C. Yager ,&nbsp;Ali Boolani","doi":"10.1016/j.imu.2025.101649","DOIUrl":null,"url":null,"abstract":"<div><h3>Purpose</h3><div>This study applied an informatics-focused, unsupervised learning framework (finite mixture modeling) to determine whether distinct clusters of coexisting conditions among patients with coronavirus disease 2019 (COVID-19) are associated with multiple (reinfection) versus single infections.</div></div><div><h3>Methods</h3><div>We analyzed 42,974 patient records containing COVID-19 diagnoses using an machine learning classification algorithm to identify comorbidity profiles. Of nearly 850 recorded conditions, 29 were retained if they occurred in at least 5 % of the sample. We then compared patients with single versus multiple COVID-19 diagnoses within each profile.</div></div><div><h3>Results</h3><div>Three comorbidity profiles emerged. The first profile (Minimal Comorbidity) was the largest (67 % of sample) and was characterized by few additional conditions. Patients classified into this profile were also 20–30 years younger, on average, than members of the other profiles. The second (Elevated Select Comorbidity) profile consisted of 24 % of the sample and was characterized by moderate-risk factors such as hypertension, hyperlipidemia, and acute respiratory failure. The third (High Comorbidity Burden) third was represented by 9 % of the sample and was characterized by conditions related to cardiovascular, renal, endocrine, and respiratory systems. Among the high-burden group, 30 % experienced reinfection, versus only 9 % in the minimal group. Overall, patients with more extensive cardiometabolic or pulmonary conditions were more likely to experience repeated infection.</div></div><div><h3>Conclusions</h3><div>By identifying and characterizing comorbidity clusters, this informatics-based approach offers deeper insight into COVID-19 reinfection dynamics. The findings may support targeted prevention, data-driven resource allocation, and precision medicine strategies by highlighting subgroups at elevated risk. Moreover, the unsupervised modeling framework is potentially adaptable to other multifactorial conditions, underscoring its broader utility in medical informatics.</div></div>","PeriodicalId":13953,"journal":{"name":"Informatics in Medicine Unlocked","volume":"55 ","pages":"Article 101649"},"PeriodicalIF":0.0000,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Informatics in Medicine Unlocked","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2352914825000371","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"Medicine","Score":null,"Total":0}
引用次数: 0

Abstract

Purpose

This study applied an informatics-focused, unsupervised learning framework (finite mixture modeling) to determine whether distinct clusters of coexisting conditions among patients with coronavirus disease 2019 (COVID-19) are associated with multiple (reinfection) versus single infections.

Methods

We analyzed 42,974 patient records containing COVID-19 diagnoses using an machine learning classification algorithm to identify comorbidity profiles. Of nearly 850 recorded conditions, 29 were retained if they occurred in at least 5 % of the sample. We then compared patients with single versus multiple COVID-19 diagnoses within each profile.

Results

Three comorbidity profiles emerged. The first profile (Minimal Comorbidity) was the largest (67 % of sample) and was characterized by few additional conditions. Patients classified into this profile were also 20–30 years younger, on average, than members of the other profiles. The second (Elevated Select Comorbidity) profile consisted of 24 % of the sample and was characterized by moderate-risk factors such as hypertension, hyperlipidemia, and acute respiratory failure. The third (High Comorbidity Burden) third was represented by 9 % of the sample and was characterized by conditions related to cardiovascular, renal, endocrine, and respiratory systems. Among the high-burden group, 30 % experienced reinfection, versus only 9 % in the minimal group. Overall, patients with more extensive cardiometabolic or pulmonary conditions were more likely to experience repeated infection.

Conclusions

By identifying and characterizing comorbidity clusters, this informatics-based approach offers deeper insight into COVID-19 reinfection dynamics. The findings may support targeted prevention, data-driven resource allocation, and precision medicine strategies by highlighting subgroups at elevated risk. Moreover, the unsupervised modeling framework is potentially adaptable to other multifactorial conditions, underscoring its broader utility in medical informatics.
COVID-19再感染风险共病集群的信息驱动无监督学习:一种有限混合建模方法
本研究应用了以信息学为中心的无监督学习框架(有限混合模型)来确定2019年冠状病毒病(COVID-19)患者中共存的不同群集是否与多次(再感染)或单次感染相关。方法使用机器学习分类算法分析42974例包含COVID-19诊断的患者记录,以确定合并症概况。在近850个记录的条件中,29个被保留,如果它们发生在至少5%的样本中。然后,我们比较了每个病例中单个和多个COVID-19诊断的患者。结果出现3种合并症。第一个特征(最小合并症)是最大的(67%的样本),其特征是很少有附加条件。这类患者的平均年龄也比其他组年轻20-30岁。第二种(高选择合并症)包括24%的样本,其特征是中度危险因素,如高血压、高脂血症和急性呼吸衰竭。第三种(高合并症负担)占样本的9%,其特征是与心血管、肾脏、内分泌和呼吸系统相关的疾病。在高负担组中,30%的人经历了再感染,而在最低负担组中只有9%。总的来说,患有更广泛的心脏代谢或肺部疾病的患者更有可能经历反复感染。结论通过识别和表征共病集群,这种基于信息学的方法可以更深入地了解COVID-19再感染动态。研究结果可能支持有针对性的预防,数据驱动的资源分配,以及通过突出高风险亚群的精准医疗策略。此外,无监督建模框架可能适用于其他多因素条件,强调其在医学信息学中的广泛应用。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Informatics in Medicine Unlocked
Informatics in Medicine Unlocked Medicine-Health Informatics
CiteScore
9.50
自引率
0.00%
发文量
282
审稿时长
39 days
期刊介绍: Informatics in Medicine Unlocked (IMU) is an international gold open access journal covering a broad spectrum of topics within medical informatics, including (but not limited to) papers focusing on imaging, pathology, teledermatology, public health, ophthalmological, nursing and translational medicine informatics. The full papers that are published in the journal are accessible to all who visit the website.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信