{"title":"Understanding malaria dynamics: Insights from interpretable machine learning in Kelem Wollega Zone, Ethiopia","authors":"Yohannes Dhuguma , Solomon Tekalign , Tegegne Sishaw , Sitotaw Haile Erena , Ashenafi Yimam , Kidist Demessie","doi":"10.1016/j.sciaf.2025.e02963","DOIUrl":null,"url":null,"abstract":"<div><h3>Problem</h3><div>Malaria, a long-standing global health problem, thrives in tropical and subtropical climates. An in-depth investigation is needed, as this is Ethiopia's deadliest parasitic disease.</div></div><div><h3>Aim</h3><div>This study attempts to address this issue by creating a model for classifying malaria outbreaks in the Kelem Wollega region of Ethiopia using historical climate patterns. The following four machine learning algorithms are evaluated: extreme gradient boosting (XGB), random forest (RF), gradient bounds (GB), and support vector machines (SVM).</div></div><div><h3>Methods</h3><div>During the training phase, models are evaluated by five-fold cross-validation and by strict initialization of the hyperparameters. SHAP (Shapley Additive explanation) and LIME (Local Interpretable Model-agnostic Explanation) have been interpreted using the best two locally and globally interpreted models. We use a surrogate decision tree model to find the balance between plausibility and precision. Performance evaluation is performed by an average of the area under the curve (mean AUC), mean F1 score, sensitivity, and specificity.</div></div><div><h3>Results</h3><div>In terms of Mean AUC, Mean F1, Sensitivity, and Specificity, Dale Wabera performs the best, with XGB values of 0.99, 0.95, 1.00, and 0.99, respectively. According to SHAP, the model's ability to forecast XGB and GB was significantly influenced by the DATE, Minimum Temperature, Maximum Temperature, and Soil moisture in the top layer. This result is consistent with real-time malaria epidemic scenarios. Local interpretability of individual cases is produced through the use of LIME, and the outcomes are well-suited to the detailed relationship between environmental variables and the malaria pandemic. The tradeoff also shows that high accuracy is typically attained with XGB models, although occasionally, fidelity is sacrificed.</div></div><div><h3>Conclusion</h3><div>This outcome demonstrates the significance of doing an in-depth interpretation of each model result both locally and globally, which clarifies the nature of feature contribution.</div></div>","PeriodicalId":21690,"journal":{"name":"Scientific African","volume":"30 ","pages":"Article e02963"},"PeriodicalIF":3.3000,"publicationDate":"2025-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Scientific African","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2468227625004338","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}
引用次数: 0
Abstract
Problem
Malaria, a long-standing global health problem, thrives in tropical and subtropical climates. An in-depth investigation is needed, as this is Ethiopia's deadliest parasitic disease.
Aim
This study attempts to address this issue by creating a model for classifying malaria outbreaks in the Kelem Wollega region of Ethiopia using historical climate patterns. The following four machine learning algorithms are evaluated: extreme gradient boosting (XGB), random forest (RF), gradient bounds (GB), and support vector machines (SVM).
Methods
During the training phase, models are evaluated by five-fold cross-validation and by strict initialization of the hyperparameters. SHAP (Shapley Additive explanation) and LIME (Local Interpretable Model-agnostic Explanation) have been interpreted using the best two locally and globally interpreted models. We use a surrogate decision tree model to find the balance between plausibility and precision. Performance evaluation is performed by an average of the area under the curve (mean AUC), mean F1 score, sensitivity, and specificity.
Results
In terms of Mean AUC, Mean F1, Sensitivity, and Specificity, Dale Wabera performs the best, with XGB values of 0.99, 0.95, 1.00, and 0.99, respectively. According to SHAP, the model's ability to forecast XGB and GB was significantly influenced by the DATE, Minimum Temperature, Maximum Temperature, and Soil moisture in the top layer. This result is consistent with real-time malaria epidemic scenarios. Local interpretability of individual cases is produced through the use of LIME, and the outcomes are well-suited to the detailed relationship between environmental variables and the malaria pandemic. The tradeoff also shows that high accuracy is typically attained with XGB models, although occasionally, fidelity is sacrificed.
Conclusion
This outcome demonstrates the significance of doing an in-depth interpretation of each model result both locally and globally, which clarifies the nature of feature contribution.