Jun Sung Park, Reenar Yoo, Soo-Young Lim, Dahyun Kim, Min Kyo Chun, Jeeho Han, Jeong-Yong Lee, Seung Jun Choi, Seak Hee Oh, Jong Seung Lee, Jina Lee
{"title":"Development of a machine learning-based prediction model for serious bacterial infections in febrile young infants.","authors":"Jun Sung Park, Reenar Yoo, Soo-Young Lim, Dahyun Kim, Min Kyo Chun, Jeeho Han, Jeong-Yong Lee, Seung Jun Choi, Seak Hee Oh, Jong Seung Lee, Jina Lee","doi":"10.1136/bmjpo-2025-003548","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>To develop and validate machine learning (ML)-based models to predict serious bacterial infections (SBIs) in febrile infants aged ≤90 days.</p><p><strong>Methods: </strong>This retrospective study analysed data from febrile infants (≥38.0℃) aged ≤90 days. The development dataset comprised data from patients who visited the Seoul Asan Medical Center between 2015 and 2021, whereas the validation dataset included data from those who visited the centre from January 2022 to August 2023. Logistic regression (LR) and eXtreme Gradient Boosting (XGB) were used to develop the models for predicting SBIs, which were then compared with traditional rule-based models.</p><p><strong>Results: </strong>The study included data from 2860 patients: 2288 (80%) in the development dataset and 572 (20%) in the validation dataset. SBIs were confirmed in 482 patients (21.0%) in the development dataset and 131 (22.9%) in the validation dataset. The XGB and LR models showed excellent performance with areas under the curve of 0.990 and 0.981 in development, and 0.989 and 0.985 in validation datasets. In validation, both models demonstrated superior specificity (82.3-87.0% vs 46.2-72.2%) and positive predictive value (61.5-68.5% vs 34.4-49.8%) compared with traditional rule-based models, while maintaining perfect sensitivity and negative predictive value (both 100% vs 81.7-100% and 92.0-100%, respectively) without any false negatives. Urinalysis, C-reactive protein and procalcitonin were identified as top-tier features in the XGB model.</p><p><strong>Conclusions: </strong>The ML-based prediction model demonstrated robust performance, with superior specificity and perfect sensitivity, which may enhance the accuracy of SBI detection and reduce the costs associated with false positives.</p>","PeriodicalId":9069,"journal":{"name":"BMJ Paediatrics Open","volume":"9 1","pages":""},"PeriodicalIF":2.3000,"publicationDate":"2025-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12314954/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMJ Paediatrics Open","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1136/bmjpo-2025-003548","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"PEDIATRICS","Score":null,"Total":0}
引用次数: 0
Abstract
Background: To develop and validate machine learning (ML)-based models to predict serious bacterial infections (SBIs) in febrile infants aged ≤90 days.
Methods: This retrospective study analysed data from febrile infants (≥38.0℃) aged ≤90 days. The development dataset comprised data from patients who visited the Seoul Asan Medical Center between 2015 and 2021, whereas the validation dataset included data from those who visited the centre from January 2022 to August 2023. Logistic regression (LR) and eXtreme Gradient Boosting (XGB) were used to develop the models for predicting SBIs, which were then compared with traditional rule-based models.
Results: The study included data from 2860 patients: 2288 (80%) in the development dataset and 572 (20%) in the validation dataset. SBIs were confirmed in 482 patients (21.0%) in the development dataset and 131 (22.9%) in the validation dataset. The XGB and LR models showed excellent performance with areas under the curve of 0.990 and 0.981 in development, and 0.989 and 0.985 in validation datasets. In validation, both models demonstrated superior specificity (82.3-87.0% vs 46.2-72.2%) and positive predictive value (61.5-68.5% vs 34.4-49.8%) compared with traditional rule-based models, while maintaining perfect sensitivity and negative predictive value (both 100% vs 81.7-100% and 92.0-100%, respectively) without any false negatives. Urinalysis, C-reactive protein and procalcitonin were identified as top-tier features in the XGB model.
Conclusions: The ML-based prediction model demonstrated robust performance, with superior specificity and perfect sensitivity, which may enhance the accuracy of SBI detection and reduce the costs associated with false positives.