Developing Machine-Learning Models to Predict Bacteremia in Febrile Adults Presenting to the Emergency Department: A Retrospective Cohort Study from a Large Center.
{"title":"Developing Machine-Learning Models to Predict Bacteremia in Febrile Adults Presenting to the Emergency Department: A Retrospective Cohort Study from a Large Center.","authors":"Chia-Ming Fu, Ike Ngo, Pak Sheung Lau, Yaroslav Ivanchuk, Fan-Ya Chou, Chih-Hung Wang, Chien-Yu Lin, Chu-Lin Tsai, Shey-Ying Chen, Tsung-Chien Lu, Hung-Yu Wei","doi":"10.5811/westjem.35866","DOIUrl":null,"url":null,"abstract":"<p><strong>Introduction: </strong>Bacteremia, a common disease but difficult to diagnose early, may result in significant morbidity and mortality without prompt treatment. We aimed to develop machine-learning (ML) algorithms to predict patients with bacteremia from febrile patients presenting to the emergency department (ED) using data that is readily available at the triage.</p><p><strong>Methods: </strong>We included all adult patients (≥18 years of age) who presented to the emergency department (ED) of National Taiwan University Hospital (NTUH), a tertiary teaching hospital in Taiwan, with the chief complaint of fever or measured body temperature more than 38°C, and who received at least one blood culture during the ED encounter. We extracted data from the Integrated Medical Database of NTUH from 2009-2018.The dataset included patient demographics, triage details, symptoms, and medical history. The positive blood culture result of at least one potential pathogen was defined as bacteremia and used as the binary classification label. We split the dataset into training/validation and testing sets (60-to-40 ratio) and trained five supervised ML models using K-fold cross-validation. The model performance was evaluated using the area under the receiver operating characteristic curve (AUC) in the testing set.</p><p><strong>Results: </strong>We included 80,201 cases in this study. Of them, 48120 cases were assigned to the training/validation set and 32,081 to the testing set. Bacteremia was identified in 5,831 (12.1%) and 3,824 (11.9%) cases of the training/validation set and test set, respectively. All ML models performed well, with CatBoost achieving the highest AUC (.844, 95% confidence interval [CI] .837-.850), followed by extreme gradient boosting (.843, 95% CI .836-.849), gradient boosting (.842, 95% CI .836-.849), light gradient boosting machine (.841, 95% CI .834-.847), and random forest (.828, 95% CI .821-.834).</p><p><strong>Conclusion: </strong>Our machine-learning model has shown excellent discriminatory performance to predict bacteremia based only on clinical features at ED triage. It has the potential to improve care quality and save more lives if successfully implemented in the ED.</p>","PeriodicalId":23682,"journal":{"name":"Western Journal of Emergency Medicine","volume":"26 3","pages":"617-626"},"PeriodicalIF":1.8000,"publicationDate":"2025-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12208070/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Western Journal of Emergency Medicine","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.5811/westjem.35866","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"EMERGENCY MEDICINE","Score":null,"Total":0}
引用次数: 0
Abstract
Introduction: Bacteremia, a common disease but difficult to diagnose early, may result in significant morbidity and mortality without prompt treatment. We aimed to develop machine-learning (ML) algorithms to predict patients with bacteremia from febrile patients presenting to the emergency department (ED) using data that is readily available at the triage.
Methods: We included all adult patients (≥18 years of age) who presented to the emergency department (ED) of National Taiwan University Hospital (NTUH), a tertiary teaching hospital in Taiwan, with the chief complaint of fever or measured body temperature more than 38°C, and who received at least one blood culture during the ED encounter. We extracted data from the Integrated Medical Database of NTUH from 2009-2018.The dataset included patient demographics, triage details, symptoms, and medical history. The positive blood culture result of at least one potential pathogen was defined as bacteremia and used as the binary classification label. We split the dataset into training/validation and testing sets (60-to-40 ratio) and trained five supervised ML models using K-fold cross-validation. The model performance was evaluated using the area under the receiver operating characteristic curve (AUC) in the testing set.
Results: We included 80,201 cases in this study. Of them, 48120 cases were assigned to the training/validation set and 32,081 to the testing set. Bacteremia was identified in 5,831 (12.1%) and 3,824 (11.9%) cases of the training/validation set and test set, respectively. All ML models performed well, with CatBoost achieving the highest AUC (.844, 95% confidence interval [CI] .837-.850), followed by extreme gradient boosting (.843, 95% CI .836-.849), gradient boosting (.842, 95% CI .836-.849), light gradient boosting machine (.841, 95% CI .834-.847), and random forest (.828, 95% CI .821-.834).
Conclusion: Our machine-learning model has shown excellent discriminatory performance to predict bacteremia based only on clinical features at ED triage. It has the potential to improve care quality and save more lives if successfully implemented in the ED.
菌血症是一种常见病,但难以早期诊断,如果不及时治疗,可能会导致严重的发病率和死亡率。我们的目标是开发机器学习(ML)算法,利用分诊时现成的数据,从急诊科(ED)的发热患者中预测菌血症患者。方法:我们纳入了所有在台湾三级教学医院国立台湾大学医院急诊科(ED)就诊的成人患者(≥18岁),主因为发烧或测量体温超过38°C,并在急诊科接受了至少一次血培养。我们从北工大综合医学数据库中提取了2009-2018年的数据。数据集包括患者人口统计、分诊细节、症状和病史。至少一种潜在病原体的血培养阳性结果被定义为菌血症,并作为二元分类标签。我们将数据集分成训练/验证和测试集(60比40的比例),并使用K-fold交叉验证训练了五个有监督的ML模型。使用测试集中的接收者工作特征曲线下面积(AUC)来评估模型的性能。结果:本研究纳入80,201例病例。其中48120例分配到训练/验证集,32081例分配到测试集。在训练/验证集和测试集中,分别鉴定出5831例(12.1%)和3824例(11.9%)的菌血症。所有ML模型都表现良好,其中CatBoost实现了最高的AUC(。844, 95%可信区间[CI] .837-.850),其次是极端梯度增强(。843, 95% CI .836-.849),梯度增强(。842, 95% CI .836-.849),光梯度增强机(。841, 95% CI .834-.847)和随机森林(。828, 95% ci .821-.834)。结论:我们的机器学习模型在仅根据急诊科分诊的临床特征预测菌血症方面表现出出色的区分性能。如果在急诊科成功实施,它有可能提高护理质量并挽救更多生命。
期刊介绍:
WestJEM focuses on how the systems and delivery of emergency care affects health, health disparities, and health outcomes in communities and populations worldwide, including the impact of social conditions on the composition of patients seeking care in emergency departments.