Interpretable prediction of acute respiratory infection disease among under-five children in Ethiopia using ensemble machine learning and Shapley additive explanations (SHAP).
{"title":"Interpretable prediction of acute respiratory infection disease among under-five children in Ethiopia using ensemble machine learning and Shapley additive explanations (SHAP).","authors":"Zinabu Bekele Tadese, Debela Tsegaye Hailu, Aschale Wubete Abebe, Shimels Derso Kebede, Agmasie Damtew Walle, Beminate Lemma Seifu, Teshome Demis Nimani","doi":"10.1177/20552076241272739","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Although the prevalence of childhood illnesses has significantly decreased, acute respiratory infections continue to be the leading cause of death and disease among children in low- and middle-income countries. Seven percent of children under five experienced symptoms in the two weeks preceding the Ethiopian demographic and health survey. Hence, this study aimed to identify interpretable predicting factors of acute respiratory infection disease among under-five children in Ethiopia using machine learning analysis techniques.</p><p><strong>Methods: </strong>Secondary data analysis was performed using 2016 Ethiopian demographic and health survey data. Data were extracted using STATA and imported into Jupyter Notebook for further analysis. The presence of acute respiratory infection in a child under the age of 5 was the outcome variable, categorized as yes and no. Five ensemble boosting machine learning algorithms such as adaptive boosting (AdaBoost), extreme gradient boosting (XGBoost), Gradient Boost, CatBoost, and light gradient-boosting machine (LightGBM) were employed on a total sample of 10,641 children under the age of 5. The Shapley additive explanations technique was used to identify the important features and effects of each feature driving the prediction.</p><p><strong>Results: </strong><b>The</b> XGBoost model achieved an accuracy of 79.3%, an F1 score of 78.4%, a recall of 78.3%, a precision of 81.7%, and a receiver operating curve area under the curve of 86.1% after model optimization. Child age (month), history of diarrhea, number of living children, duration of breastfeeding, and mother's occupation were the top predicting factors of acute respiratory infection among children under the age of 5 in Ethiopia.</p><p><strong>Conclusion: </strong>The XGBoost classifier was the best predictive model with improved performance, and predicting factors of acute respiratory infection were identified with the help of the Shapely additive explanation. The findings of this study can help policymakers and stakeholders understand the decision-making process for acute respiratory infection prevention among under-five children in Ethiopia.</p>","PeriodicalId":51333,"journal":{"name":"DIGITAL HEALTH","volume":null,"pages":null},"PeriodicalIF":2.9000,"publicationDate":"2024-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11304488/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"DIGITAL HEALTH","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1177/20552076241272739","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}
引用次数: 0
Abstract
Background: Although the prevalence of childhood illnesses has significantly decreased, acute respiratory infections continue to be the leading cause of death and disease among children in low- and middle-income countries. Seven percent of children under five experienced symptoms in the two weeks preceding the Ethiopian demographic and health survey. Hence, this study aimed to identify interpretable predicting factors of acute respiratory infection disease among under-five children in Ethiopia using machine learning analysis techniques.
Methods: Secondary data analysis was performed using 2016 Ethiopian demographic and health survey data. Data were extracted using STATA and imported into Jupyter Notebook for further analysis. The presence of acute respiratory infection in a child under the age of 5 was the outcome variable, categorized as yes and no. Five ensemble boosting machine learning algorithms such as adaptive boosting (AdaBoost), extreme gradient boosting (XGBoost), Gradient Boost, CatBoost, and light gradient-boosting machine (LightGBM) were employed on a total sample of 10,641 children under the age of 5. The Shapley additive explanations technique was used to identify the important features and effects of each feature driving the prediction.
Results: The XGBoost model achieved an accuracy of 79.3%, an F1 score of 78.4%, a recall of 78.3%, a precision of 81.7%, and a receiver operating curve area under the curve of 86.1% after model optimization. Child age (month), history of diarrhea, number of living children, duration of breastfeeding, and mother's occupation were the top predicting factors of acute respiratory infection among children under the age of 5 in Ethiopia.
Conclusion: The XGBoost classifier was the best predictive model with improved performance, and predicting factors of acute respiratory infection were identified with the help of the Shapely additive explanation. The findings of this study can help policymakers and stakeholders understand the decision-making process for acute respiratory infection prevention among under-five children in Ethiopia.