Predicting risk of early-onset sepsis in low-resource neonatal units using routine healthcare data: development and evaluation of multivariable statistical and machine learning models.
Ed Lowther, Nushrat Khan, Mario Cortina-Borja, Gwendoline Lilly Chimhini, Samuel R Neal, Marcia Mangiza, Felicity Fitzgerald, Michelle Heys, Simbarashe Chimhuya
{"title":"Predicting risk of early-onset sepsis in low-resource neonatal units using routine healthcare data: development and evaluation of multivariable statistical and machine learning models.","authors":"Ed Lowther, Nushrat Khan, Mario Cortina-Borja, Gwendoline Lilly Chimhini, Samuel R Neal, Marcia Mangiza, Felicity Fitzgerald, Michelle Heys, Simbarashe Chimhuya","doi":"10.1136/bmjpo-2025-003617","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Neonatal sepsis is a major cause of morbidity and mortality in low-resource settings and accurate, context-appropriate diagnostic methods are urgently needed to improve clinical outcomes.</p><p><strong>Methods: </strong>We used data collected using Neotree, an open source digital health intervention tool, from neonates admitted to Sally Mugabe Central Hospital in Harare between February 2021 and September 2024 to model a composite outcome variable comprised of senior clinician-assigned diagnosis at discharge or cause of death and blood culture test results. Three statistical and machine learning algorithms were developed, tuned where appropriate using cross-validation and evaluated.</p><p><strong>Results: </strong>In total, 917 cases of early-onset neonatal sepsis were identified among the 18 345 neonates in our study sample, comprising 664 cases of clinician diagnosis and 253 positive blood culture results. With area under the receiver operating characteristic curve as a metric, LightGBM, a machine learning gradient-boosted tree classifier, performed marginally better (0.712; 95% CI 0.673 to 0.75) than logistic regression (0.687; 95% CI 0.646 to 0.728) on a held-out evaluation dataset. A simple and easily interpretable machine learning model, the <i>k</i>-neighbours classifier, offered comparable performance (0.699; 95% CI 0.662 to 0.736).</p><p><strong>Conclusions: </strong>This study explored the potential advantages of using machine learning in the triage of neonates at risk of sepsis in low-resource settings where gold-standard blood culture test results are often unavailable. While the differences in performance metrics were not statistically significant, the machine learning approaches in our study offer other advantages including more intuitive predictions and the ability to handle missing data without imputation.</p>","PeriodicalId":9069,"journal":{"name":"BMJ Paediatrics Open","volume":"9 1","pages":""},"PeriodicalIF":2.3000,"publicationDate":"2025-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12481410/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMJ Paediatrics Open","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1136/bmjpo-2025-003617","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"PEDIATRICS","Score":null,"Total":0}
引用次数: 0
Abstract
Background: Neonatal sepsis is a major cause of morbidity and mortality in low-resource settings and accurate, context-appropriate diagnostic methods are urgently needed to improve clinical outcomes.
Methods: We used data collected using Neotree, an open source digital health intervention tool, from neonates admitted to Sally Mugabe Central Hospital in Harare between February 2021 and September 2024 to model a composite outcome variable comprised of senior clinician-assigned diagnosis at discharge or cause of death and blood culture test results. Three statistical and machine learning algorithms were developed, tuned where appropriate using cross-validation and evaluated.
Results: In total, 917 cases of early-onset neonatal sepsis were identified among the 18 345 neonates in our study sample, comprising 664 cases of clinician diagnosis and 253 positive blood culture results. With area under the receiver operating characteristic curve as a metric, LightGBM, a machine learning gradient-boosted tree classifier, performed marginally better (0.712; 95% CI 0.673 to 0.75) than logistic regression (0.687; 95% CI 0.646 to 0.728) on a held-out evaluation dataset. A simple and easily interpretable machine learning model, the k-neighbours classifier, offered comparable performance (0.699; 95% CI 0.662 to 0.736).
Conclusions: This study explored the potential advantages of using machine learning in the triage of neonates at risk of sepsis in low-resource settings where gold-standard blood culture test results are often unavailable. While the differences in performance metrics were not statistically significant, the machine learning approaches in our study offer other advantages including more intuitive predictions and the ability to handle missing data without imputation.
背景:新生儿脓毒症是低资源环境下发病率和死亡率的主要原因,迫切需要准确、符合环境的诊断方法来改善临床结果。方法:我们使用开源数字健康干预工具Neotree收集的数据,这些数据来自2021年2月至2024年9月期间哈拉雷萨利穆加贝中心医院收治的新生儿,以模拟由高级临床医生指定的出院诊断或死亡原因和血培养测试结果组成的复合结果变量。开发了三种统计和机器学习算法,并在适当的地方使用交叉验证和评估进行了调整。结果:本研究共纳入18345例新生儿,共发现917例早发型新生儿脓毒症,其中临床诊断664例,血培养阳性253例。以接收者工作特征曲线下的面积为度量标准,机器学习梯度增强树分类器LightGBM在保留评估数据集上的表现(0.712;95% CI 0.673至0.75)略好于逻辑回归(0.687;95% CI 0.646至0.728)。一个简单且易于解释的机器学习模型,k-邻居分类器,提供了类似的性能(0.699;95% CI 0.662至0.736)。结论:本研究探讨了在低资源环境中使用机器学习对有败血症风险的新生儿进行分类的潜在优势,这些环境通常无法获得金标准血培养试验结果。虽然性能指标的差异在统计上并不显着,但我们研究中的机器学习方法提供了其他优势,包括更直观的预测和处理缺失数据而无需插入的能力。