AI4CDI: Introducing a novel machine learning approach to demonstrate feasibility of timely and early identification of at-risk populations for Clostridioides difficile infections

IF 2.6 3区 生物学 Q3 MICROBIOLOGY
Anastasia Karatzia , Danai Aristeridou , Wawi Kantz , A. Carmine Colavecchia , Harish Madhava , Mohammad Ateya , Carole Czudek , Patrick H. Kelly , Kate Halsby
{"title":"AI4CDI: Introducing a novel machine learning approach to demonstrate feasibility of timely and early identification of at-risk populations for Clostridioides difficile infections","authors":"Anastasia Karatzia ,&nbsp;Danai Aristeridou ,&nbsp;Wawi Kantz ,&nbsp;A. Carmine Colavecchia ,&nbsp;Harish Madhava ,&nbsp;Mohammad Ateya ,&nbsp;Carole Czudek ,&nbsp;Patrick H. Kelly ,&nbsp;Kate Halsby","doi":"10.1016/j.anaerobe.2025.102978","DOIUrl":null,"url":null,"abstract":"<div><h3>Objective</h3><div>We evaluated machine learning (ML) model feasibility to predict <em>Clostridioides difficile</em> infection (CDI) six months prior to onset and to identify early predictors over a longer period.</div></div><div><h3>Methods</h3><div>A retrospective analysis was performed using electronic health records data from US adults (Optum Market Clarity). Cases with CDI and non-CDI controls were identified. A 1:1 coarsened exact matching algorithm was applied, with final analysis cohorts of 4736 cases and 4732 controls. CDI-relevant features were identified from the published literature, and information was extracted for &gt;900 features. The final model was trained on 597 mostly binary features. Feature information during the 6 months prior to date of first CDI diagnosis was hidden to the model to identify patients at risk for CDI with a longer time horizon. Sensitivity analysis was conducted on cases aged 65–80 years.</div></div><div><h3>Results</h3><div>Median age was 65 years (19–88) in case and control cohorts. The Gradient Boosted Trees ML model had an Area Under the Curve Receiver Operating Characteristic (AUC-ROC) of 0.79. Post-model bias evaluation revealed disparities in sensitivity (race). Long-term predictors included hospitalization days. While some predictors were exclusive to the 65–80 years model, others were more strongly associated with CDI in the overall model.</div></div><div><h3>Conclusions</h3><div>We developed a ML model that can identify patient groups at increased risk for primary CDI. While the predictive capability of this ML model is promising, validation is needed before exploring its readiness for use in healthcare settings to inform preventive measures for CDI.</div></div>","PeriodicalId":8050,"journal":{"name":"Anaerobe","volume":"94 ","pages":"Article 102978"},"PeriodicalIF":2.6000,"publicationDate":"2025-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Anaerobe","FirstCategoryId":"99","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1075996425000411","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"MICROBIOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Objective

We evaluated machine learning (ML) model feasibility to predict Clostridioides difficile infection (CDI) six months prior to onset and to identify early predictors over a longer period.

Methods

A retrospective analysis was performed using electronic health records data from US adults (Optum Market Clarity). Cases with CDI and non-CDI controls were identified. A 1:1 coarsened exact matching algorithm was applied, with final analysis cohorts of 4736 cases and 4732 controls. CDI-relevant features were identified from the published literature, and information was extracted for >900 features. The final model was trained on 597 mostly binary features. Feature information during the 6 months prior to date of first CDI diagnosis was hidden to the model to identify patients at risk for CDI with a longer time horizon. Sensitivity analysis was conducted on cases aged 65–80 years.

Results

Median age was 65 years (19–88) in case and control cohorts. The Gradient Boosted Trees ML model had an Area Under the Curve Receiver Operating Characteristic (AUC-ROC) of 0.79. Post-model bias evaluation revealed disparities in sensitivity (race). Long-term predictors included hospitalization days. While some predictors were exclusive to the 65–80 years model, others were more strongly associated with CDI in the overall model.

Conclusions

We developed a ML model that can identify patient groups at increased risk for primary CDI. While the predictive capability of this ML model is promising, validation is needed before exploring its readiness for use in healthcare settings to inform preventive measures for CDI.
AI4CDI:介绍一种新的机器学习方法来证明及时和早期识别艰难梭菌感染高危人群的可行性。
目的:我们评估了机器学习(ML)模型在发病前6个月预测艰难梭菌感染(CDI)的可行性,并在更长的时间内确定早期预测因素。方法:使用美国成年人的电子健康记录数据进行回顾性分析(Optum Market Clarity)。确定有CDI和非CDI对照的病例。采用1:1粗化精确匹配算法,最终分析队列4736例,对照4732例。从已发表的文献中识别cdi相关特征,提取bbb900特征信息。最终的模型是在597个主要是二元特征上训练的。在首次CDI诊断日期之前6个月的特征信息被隐藏到模型中,以识别具有较长时间范围CDI风险的患者。对65 ~ 80岁病例进行敏感性分析。结果:病例组和对照组的中位年龄为65岁(19-88岁)。梯度增强树ML模型的曲线下面积接收者工作特征(AUC-ROC)为0.79。模型后偏倚评价显示敏感度(种族)存在差异。长期预测指标包括住院天数。虽然一些预测因子仅适用于65-80年模型,但其他预测因子在整个模型中与CDI的相关性更强。结论:我们开发了一个ML模型,可以识别原发性CDI风险增加的患者群体。虽然该ML模型的预测能力很有希望,但在探索其在医疗保健环境中使用的准备情况以告知CDI的预防措施之前,需要进行验证。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Anaerobe
Anaerobe 生物-微生物学
CiteScore
5.20
自引率
8.70%
发文量
137
审稿时长
76 days
期刊介绍: Anaerobe is essential reading for those who wish to remain at the forefront of discoveries relating to life processes of strictly anaerobes. The journal is multi-disciplinary, and provides a unique forum for those investigating anaerobic organisms that cause infections in humans and animals, as well as anaerobes that play roles in microbiomes or environmental processes. Anaerobe publishes reviews, mini reviews, original research articles, notes and case reports. Relevant topics fall into the broad categories of anaerobes in human and animal diseases, anaerobes in the microbiome, anaerobes in the environment, diagnosis of anaerobes in clinical microbiology laboratories, molecular biology, genetics, pathogenesis, toxins and antibiotic susceptibility of anaerobic bacteria.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信