Anastasia Karatzia , Danai Aristeridou , Wawi Kantz , A. Carmine Colavecchia , Harish Madhava , Mohammad Ateya , Carole Czudek , Patrick H. Kelly , Kate Halsby
{"title":"AI4CDI:介绍一种新的机器学习方法来证明及时和早期识别艰难梭菌感染高危人群的可行性。","authors":"Anastasia Karatzia , Danai Aristeridou , Wawi Kantz , A. Carmine Colavecchia , Harish Madhava , Mohammad Ateya , Carole Czudek , Patrick H. Kelly , Kate Halsby","doi":"10.1016/j.anaerobe.2025.102978","DOIUrl":null,"url":null,"abstract":"<div><h3>Objective</h3><div>We evaluated machine learning (ML) model feasibility to predict <em>Clostridioides difficile</em> infection (CDI) six months prior to onset and to identify early predictors over a longer period.</div></div><div><h3>Methods</h3><div>A retrospective analysis was performed using electronic health records data from US adults (Optum Market Clarity). Cases with CDI and non-CDI controls were identified. A 1:1 coarsened exact matching algorithm was applied, with final analysis cohorts of 4736 cases and 4732 controls. CDI-relevant features were identified from the published literature, and information was extracted for >900 features. The final model was trained on 597 mostly binary features. Feature information during the 6 months prior to date of first CDI diagnosis was hidden to the model to identify patients at risk for CDI with a longer time horizon. Sensitivity analysis was conducted on cases aged 65–80 years.</div></div><div><h3>Results</h3><div>Median age was 65 years (19–88) in case and control cohorts. The Gradient Boosted Trees ML model had an Area Under the Curve Receiver Operating Characteristic (AUC-ROC) of 0.79. Post-model bias evaluation revealed disparities in sensitivity (race). Long-term predictors included hospitalization days. While some predictors were exclusive to the 65–80 years model, others were more strongly associated with CDI in the overall model.</div></div><div><h3>Conclusions</h3><div>We developed a ML model that can identify patient groups at increased risk for primary CDI. While the predictive capability of this ML model is promising, validation is needed before exploring its readiness for use in healthcare settings to inform preventive measures for CDI.</div></div>","PeriodicalId":8050,"journal":{"name":"Anaerobe","volume":"94 ","pages":"Article 102978"},"PeriodicalIF":2.6000,"publicationDate":"2025-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"AI4CDI: Introducing a novel machine learning approach to demonstrate feasibility of timely and early identification of at-risk populations for Clostridioides difficile infections\",\"authors\":\"Anastasia Karatzia , Danai Aristeridou , Wawi Kantz , A. Carmine Colavecchia , Harish Madhava , Mohammad Ateya , Carole Czudek , Patrick H. Kelly , Kate Halsby\",\"doi\":\"10.1016/j.anaerobe.2025.102978\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><h3>Objective</h3><div>We evaluated machine learning (ML) model feasibility to predict <em>Clostridioides difficile</em> infection (CDI) six months prior to onset and to identify early predictors over a longer period.</div></div><div><h3>Methods</h3><div>A retrospective analysis was performed using electronic health records data from US adults (Optum Market Clarity). Cases with CDI and non-CDI controls were identified. A 1:1 coarsened exact matching algorithm was applied, with final analysis cohorts of 4736 cases and 4732 controls. CDI-relevant features were identified from the published literature, and information was extracted for >900 features. The final model was trained on 597 mostly binary features. Feature information during the 6 months prior to date of first CDI diagnosis was hidden to the model to identify patients at risk for CDI with a longer time horizon. Sensitivity analysis was conducted on cases aged 65–80 years.</div></div><div><h3>Results</h3><div>Median age was 65 years (19–88) in case and control cohorts. The Gradient Boosted Trees ML model had an Area Under the Curve Receiver Operating Characteristic (AUC-ROC) of 0.79. Post-model bias evaluation revealed disparities in sensitivity (race). Long-term predictors included hospitalization days. While some predictors were exclusive to the 65–80 years model, others were more strongly associated with CDI in the overall model.</div></div><div><h3>Conclusions</h3><div>We developed a ML model that can identify patient groups at increased risk for primary CDI. While the predictive capability of this ML model is promising, validation is needed before exploring its readiness for use in healthcare settings to inform preventive measures for CDI.</div></div>\",\"PeriodicalId\":8050,\"journal\":{\"name\":\"Anaerobe\",\"volume\":\"94 \",\"pages\":\"Article 102978\"},\"PeriodicalIF\":2.6000,\"publicationDate\":\"2025-06-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Anaerobe\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1075996425000411\",\"RegionNum\":3,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"MICROBIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Anaerobe","FirstCategoryId":"99","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1075996425000411","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"MICROBIOLOGY","Score":null,"Total":0}
AI4CDI: Introducing a novel machine learning approach to demonstrate feasibility of timely and early identification of at-risk populations for Clostridioides difficile infections
Objective
We evaluated machine learning (ML) model feasibility to predict Clostridioides difficile infection (CDI) six months prior to onset and to identify early predictors over a longer period.
Methods
A retrospective analysis was performed using electronic health records data from US adults (Optum Market Clarity). Cases with CDI and non-CDI controls were identified. A 1:1 coarsened exact matching algorithm was applied, with final analysis cohorts of 4736 cases and 4732 controls. CDI-relevant features were identified from the published literature, and information was extracted for >900 features. The final model was trained on 597 mostly binary features. Feature information during the 6 months prior to date of first CDI diagnosis was hidden to the model to identify patients at risk for CDI with a longer time horizon. Sensitivity analysis was conducted on cases aged 65–80 years.
Results
Median age was 65 years (19–88) in case and control cohorts. The Gradient Boosted Trees ML model had an Area Under the Curve Receiver Operating Characteristic (AUC-ROC) of 0.79. Post-model bias evaluation revealed disparities in sensitivity (race). Long-term predictors included hospitalization days. While some predictors were exclusive to the 65–80 years model, others were more strongly associated with CDI in the overall model.
Conclusions
We developed a ML model that can identify patient groups at increased risk for primary CDI. While the predictive capability of this ML model is promising, validation is needed before exploring its readiness for use in healthcare settings to inform preventive measures for CDI.
期刊介绍:
Anaerobe is essential reading for those who wish to remain at the forefront of discoveries relating to life processes of strictly anaerobes. The journal is multi-disciplinary, and provides a unique forum for those investigating anaerobic organisms that cause infections in humans and animals, as well as anaerobes that play roles in microbiomes or environmental processes.
Anaerobe publishes reviews, mini reviews, original research articles, notes and case reports. Relevant topics fall into the broad categories of anaerobes in human and animal diseases, anaerobes in the microbiome, anaerobes in the environment, diagnosis of anaerobes in clinical microbiology laboratories, molecular biology, genetics, pathogenesis, toxins and antibiotic susceptibility of anaerobic bacteria.