Machine learning for predicting severe dengue in Puerto Rico.

IF 5.5 1区医学

Infectious Diseases of Poverty Pub Date : 2025-02-04 DOI:10.1186/s40249-025-01273-0

Zachary J Madewell, Dania M Rodriguez, Maile B Thayer, Vanessa Rivera-Amill, Gabriela Paz-Bailey, Laura E Adams, Joshua M Wong

{"title":"Machine learning for predicting severe dengue in Puerto Rico.","authors":"Zachary J Madewell, Dania M Rodriguez, Maile B Thayer, Vanessa Rivera-Amill, Gabriela Paz-Bailey, Laura E Adams, Joshua M Wong","doi":"10.1186/s40249-025-01273-0","DOIUrl":null,"url":null,"abstract":"Background: Distinguishing between non-severe and severe dengue is crucial for timely intervention and reducing morbidity and mortality. World Health Organization (WHO)-recommended warning signs offer a practical approach for clinicians but have limited sensitivity and specificity. This study aims to evaluate machine learning (ML) model performance compared to WHO-recommended warning signs in predicting severe dengue among laboratory-confirmed cases in Puerto Rico.Methods: We analyzed data from Puerto Rico's Sentinel Enhanced Dengue Surveillance System (May 2012-August 2024), using 40 clinical, demographic, and laboratory variables. Nine ML models, including Decision Trees, K-Nearest Neighbors, Naïve Bayes, Support Vector Machines, Artificial Neural Networks, AdaBoost, CatBoost, LightGBM, and XGBoost, were trained using fivefold cross-validation and evaluated with area under the receiver operating characteristic curve (AUC-ROC), sensitivity, and specificity. A subanalysis excluded hemoconcentration and leukopenia to assess performance in resource-limited settings. An AUC-ROC value of 0.5 indicates no discriminative power, while values closer to 1.0 reflect better performance.Results: Among the 1708 laboratory-confirmed dengue cases, 24.3% were classified as severe. Gradient boosting algorithms achieved the highest predictive performance, with an AUC-ROC of 97.1% (95% CI: 96.0-98.3%) for CatBoost using the full 40-variable feature set. Feature importance analysis identified hemoconcentration (≥ 20% increase during illness or ≥ 20% above baseline for age and sex), leukopenia (white blood cell count < 4000/mm3), and timing of presentation at 4-6 days post-symptom onset as key predictors. When excluding hemoconcentration and leukopenia, the CatBoost AUC-ROC was 96.7% (95% CI: 95.5-98.0%), demonstrating minimal reduction in performance. Individual warning signs like abdominal pain and restlessness had sensitivities of 79.0% and 64.6%, but lower specificities of 48.4% and 59.1%, respectively. Combining ≥ 3 warning signs improved specificity (80.9%) while maintaining moderate sensitivity (78.6%), resulting in an AUC-ROC of 74.0%.Conclusions: ML models, especially gradient boosting algorithms, outperformed traditional warning signs in predicting severe dengue. Integrating these models into clinical decision-support tools could help clinicians better identify high-risk patients, guiding timely interventions like hospitalization, closer monitoring, or the administration of intravenous fluids. The subanalysis excluding hemoconcentration confirmed the models' applicability in resource-limited settings, where access to laboratory data may be limited.","PeriodicalId":48820,"journal":{"name":"Infectious Diseases of Poverty","volume":"14 1","pages":"5"},"PeriodicalIF":5.5000,"publicationDate":"2025-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11796212/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Infectious Diseases of Poverty","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s40249-025-01273-0","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Background: Distinguishing between non-severe and severe dengue is crucial for timely intervention and reducing morbidity and mortality. World Health Organization (WHO)-recommended warning signs offer a practical approach for clinicians but have limited sensitivity and specificity. This study aims to evaluate machine learning (ML) model performance compared to WHO-recommended warning signs in predicting severe dengue among laboratory-confirmed cases in Puerto Rico.

Methods: We analyzed data from Puerto Rico's Sentinel Enhanced Dengue Surveillance System (May 2012-August 2024), using 40 clinical, demographic, and laboratory variables. Nine ML models, including Decision Trees, K-Nearest Neighbors, Naïve Bayes, Support Vector Machines, Artificial Neural Networks, AdaBoost, CatBoost, LightGBM, and XGBoost, were trained using fivefold cross-validation and evaluated with area under the receiver operating characteristic curve (AUC-ROC), sensitivity, and specificity. A subanalysis excluded hemoconcentration and leukopenia to assess performance in resource-limited settings. An AUC-ROC value of 0.5 indicates no discriminative power, while values closer to 1.0 reflect better performance.

Results: Among the 1708 laboratory-confirmed dengue cases, 24.3% were classified as severe. Gradient boosting algorithms achieved the highest predictive performance, with an AUC-ROC of 97.1% (95% CI: 96.0-98.3%) for CatBoost using the full 40-variable feature set. Feature importance analysis identified hemoconcentration (≥ 20% increase during illness or ≥ 20% above baseline for age and sex), leukopenia (white blood cell count < 4000/mm³), and timing of presentation at 4-6 days post-symptom onset as key predictors. When excluding hemoconcentration and leukopenia, the CatBoost AUC-ROC was 96.7% (95% CI: 95.5-98.0%), demonstrating minimal reduction in performance. Individual warning signs like abdominal pain and restlessness had sensitivities of 79.0% and 64.6%, but lower specificities of 48.4% and 59.1%, respectively. Combining ≥ 3 warning signs improved specificity (80.9%) while maintaining moderate sensitivity (78.6%), resulting in an AUC-ROC of 74.0%.

Conclusions: ML models, especially gradient boosting algorithms, outperformed traditional warning signs in predicting severe dengue. Integrating these models into clinical decision-support tools could help clinicians better identify high-risk patients, guiding timely interventions like hospitalization, closer monitoring, or the administration of intravenous fluids. The subanalysis excluding hemoconcentration confirmed the models' applicability in resource-limited settings, where access to laboratory data may be limited.

Abstract Image

查看原文本刊更多论文

预测波多黎各严重登革热的机器学习。

背景：区分非重症和重症登革热对于及时干预和降低发病率和死亡率至关重要。世界卫生组织（WHO）推荐的警告标志为临床医生提供了一种实用的方法，但其敏感性和特异性有限。本研究旨在评估机器学习（ML）模型与世卫组织推荐的警告信号在预测波多黎各实验室确诊病例中的严重登革热方面的表现。方法：我们分析了波多黎各哨兵加强登革热监测系统（2012年5月- 2024年8月）的数据，使用了40个临床、人口统计学和实验室变量。9个ML模型，包括决策树、k近邻、Naïve贝叶斯、支持向量机、人工神经网络、AdaBoost、CatBoost、LightGBM和XGBoost，使用五倍交叉验证进行训练，并使用受试者工作特征曲线下面积（AUC-ROC）、灵敏度和特异性进行评估。一项亚分析排除了血液浓缩和白细胞减少，以评估在资源有限的情况下的表现。AUC-ROC值为0.5表示无判别能力，接近1.0表示性能较好。结果：1708例登革热实验室确诊病例中重症病例占24.3%；梯度增强算法实现了最高的预测性能，使用完整的40个变量特征集，CatBoost的AUC-ROC为97.1% （95% CI: 96.0-98.3%）。特征重要性分析确定了血液浓度（疾病期间升高≥20%或年龄和性别高于基线≥20%）、白细胞减少（白细胞计数3）和症状出现后4-6天的就诊时间为关键预测因素。当排除血液浓缩和白细胞减少时，CatBoost的AUC-ROC为96.7% (95% CI: 95.5% -98.0%)，显示出最小的性能下降。个体警告信号如腹痛和躁动的敏感性分别为79.0%和64.6%，但特异性较低，分别为48.4%和59.1%。结合≥3个警告信号可提高特异性（80.9%），同时保持中等敏感性（78.6%），AUC-ROC为74.0%。结论：ML模型，特别是梯度增强算法，在预测严重登革热方面优于传统的警告信号。将这些模型整合到临床决策支持工具中，可以帮助临床医生更好地识别高危患者，指导及时的干预措施，如住院治疗、密切监测或静脉输液。排除血液浓度的亚分析证实了模型在资源有限的环境中的适用性，在这些环境中，获得实验室数据的机会可能有限。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Infectious Diseases of Poverty INFECTIOUS DISEASES-

自引率

1.20%

发文量

368

期刊介绍： Infectious Diseases of Poverty is an open access, peer-reviewed journal that focuses on addressing essential public health questions related to infectious diseases of poverty. The journal covers a wide range of topics including the biology of pathogens and vectors, diagnosis and detection, treatment and case management, epidemiology and modeling, zoonotic hosts and animal reservoirs, control strategies and implementation, new technologies and application. It also considers the transdisciplinary or multisectoral effects on health systems, ecohealth, environmental management, and innovative technology. The journal aims to identify and assess research and information gaps that hinder progress towards new interventions for public health problems in the developing world. Additionally, it provides a platform for discussing these issues to advance research and evidence building for improved public health interventions in poor settings.