Predicting Mortality in Hospitalized COVID-19 Patients in Zambia: An Application of Machine Learning.

IF 1.1 Q4 PUBLIC, ENVIRONMENTAL & OCCUPATIONAL HEALTH

Global Health Epidemiology and Genomics Pub Date : 2023-05-22 eCollection Date: 2023-01-01 DOI:10.1155/2023/8921220

Clyde Mulenga, Patrick Kaonga, Raymond Hamoonga, Mazyanga Lucy Mazaba, Freeman Chabala, Patrick Musonda

{"title":"Predicting Mortality in Hospitalized COVID-19 Patients in Zambia: An Application of Machine Learning.","authors":"Clyde Mulenga, Patrick Kaonga, Raymond Hamoonga, Mazyanga Lucy Mazaba, Freeman Chabala, Patrick Musonda","doi":"10.1155/2023/8921220","DOIUrl":null,"url":null,"abstract":"The coronavirus disease 2019 (COVID-19) has wreaked havoc globally, resulting in millions of cases and deaths. The objective of this study was to predict mortality in hospitalized COVID-19 patients in Zambia using machine learning (ML) methods based on factors that have been shown to be predictive of mortality and thereby improve pandemic preparedness. This research employed seven powerful ML models that included decision tree (DT), random forest (RF), support vector machines (SVM), logistic regression (LR), Naïve Bayes (NB), gradient boosting (GB), and XGBoost (XGB). These classifiers were trained on 1,433 hospitalized COVID-19 patients from various health facilities in Zambia. The performances achieved by these models were checked using accuracy, recall, F1-Score, area under the receiver operating characteristic curve (ROC_AUC), area under the precision-recall curve (PRC_AUC), and other metrics. The best-performing model was the XGB which had an accuracy of 92.3%, recall of 94.2%, F1-Score of 92.4%, and ROC_AUC of 97.5%. The pairwise Mann-Whitney U-test analysis showed that the second-best model (GB) and the third-best model (RF) did not perform significantly worse than the best model (XGB) and had the following: GB had an accuracy of 91.7%, recall of 94.2%, F1-Score of 91.9%, and ROC_AUC of 97.1%. RF had an accuracy of 90.8%, recall of 93.6%, F1-Score of 91.0%, and ROC_AUC of 96.8%. Other models showed similar results for the same metrics checked. The study successfully derived and validated the selected ML models and predicted mortality effectively with reasonably high performance in the stated metrics. The feature importance analysis found that knowledge of underlying health conditions about patients' hospital length of stay (LOS), white blood cell count, age, and other factors can help healthcare providers offer lifesaving services on time, improve pandemic preparedness, and decongest health facilities in Zambia and other countries with similar settings.","PeriodicalId":44052,"journal":{"name":"Global Health Epidemiology and Genomics","volume":"2023 ","pages":"8921220"},"PeriodicalIF":1.1000,"publicationDate":"2023-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10228226/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Global Health Epidemiology and Genomics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1155/2023/8921220","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2023/1/1 0:00:00","PubModel":"eCollection","JCR":"Q4","JCRName":"PUBLIC, ENVIRONMENTAL & OCCUPATIONAL HEALTH","Score":null,"Total":0}

引用次数: 0

Abstract

The coronavirus disease 2019 (COVID-19) has wreaked havoc globally, resulting in millions of cases and deaths. The objective of this study was to predict mortality in hospitalized COVID-19 patients in Zambia using machine learning (ML) methods based on factors that have been shown to be predictive of mortality and thereby improve pandemic preparedness. This research employed seven powerful ML models that included decision tree (DT), random forest (RF), support vector machines (SVM), logistic regression (LR), Naïve Bayes (NB), gradient boosting (GB), and XGBoost (XGB). These classifiers were trained on 1,433 hospitalized COVID-19 patients from various health facilities in Zambia. The performances achieved by these models were checked using accuracy, recall, F1-Score, area under the receiver operating characteristic curve (ROC_AUC), area under the precision-recall curve (PRC_AUC), and other metrics. The best-performing model was the XGB which had an accuracy of 92.3%, recall of 94.2%, F1-Score of 92.4%, and ROC_AUC of 97.5%. The pairwise Mann-Whitney U-test analysis showed that the second-best model (GB) and the third-best model (RF) did not perform significantly worse than the best model (XGB) and had the following: GB had an accuracy of 91.7%, recall of 94.2%, F1-Score of 91.9%, and ROC_AUC of 97.1%. RF had an accuracy of 90.8%, recall of 93.6%, F1-Score of 91.0%, and ROC_AUC of 96.8%. Other models showed similar results for the same metrics checked. The study successfully derived and validated the selected ML models and predicted mortality effectively with reasonably high performance in the stated metrics. The feature importance analysis found that knowledge of underlying health conditions about patients' hospital length of stay (LOS), white blood cell count, age, and other factors can help healthcare providers offer lifesaving services on time, improve pandemic preparedness, and decongest health facilities in Zambia and other countries with similar settings.

Abstract Image

查看原文本刊更多论文

预测赞比亚 COVID-19 住院患者的死亡率：机器学习的应用

2019 年冠状病毒病（COVID-19）已在全球范围内造成严重破坏，导致数百万人发病和死亡。本研究的目的是根据已被证明可预测死亡率的因素，使用机器学习（ML）方法预测赞比亚 COVID-19 住院患者的死亡率，从而改善大流行病的防备工作。这项研究采用了七种功能强大的 ML 模型，包括决策树 (DT)、随机森林 (RF)、支持向量机 (SVM)、逻辑回归 (LR)、奈夫贝叶斯 (NB)、梯度提升 (GB) 和 XGBoost (XGB)。这些分类器是在来自赞比亚不同医疗机构的 1,433 名 COVID-19 住院患者身上进行训练的。使用准确率、召回率、F1-分数、接收者操作特征曲线下面积（ROC_AUC）、精确度-召回曲线下面积（PRC_AUC）和其他指标检验了这些模型的性能。表现最好的模型是 XGB，其准确率为 92.3%，召回率为 94.2%，F1-Score 为 92.4%，ROC_AUC 为 97.5%。Mann-Whitney U 检验分析表明，次优模型（GB）和第三优模型（RF）的表现并不比最佳模型（XGB）差，具体如下：GB 的准确率为 91.7%，召回率为 94.2%，F1 分数为 91.9%，ROC_AUC 为 97.1%。RF 的准确率为 90.8%，召回率为 93.6%，F1-分数为 91.0%，ROC_AUC 为 96.8%。在检查的相同指标中，其他模型也显示出类似的结果。该研究成功地推导和验证了所选的 ML 模型，并有效地预测了死亡率，在所述指标方面具有相当高的性能。特征重要性分析发现，了解患者住院时间（LOS）、白细胞计数、年龄和其他因素的潜在健康状况，有助于医疗服务提供者及时提供救生服务，改善大流行病的准备工作，并缓解赞比亚和其他具有类似环境的国家的医疗设施的拥挤状况。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊