Interpretable prediction of acute respiratory infection disease among under-five children in Ethiopia using ensemble machine learning and Shapley additive explanations (SHAP).

IF 2.9 3区 医学 Q2 HEALTH CARE SCIENCES & SERVICES
DIGITAL HEALTH Pub Date : 2024-08-06 eCollection Date: 2024-01-01 DOI:10.1177/20552076241272739
Zinabu Bekele Tadese, Debela Tsegaye Hailu, Aschale Wubete Abebe, Shimels Derso Kebede, Agmasie Damtew Walle, Beminate Lemma Seifu, Teshome Demis Nimani
{"title":"Interpretable prediction of acute respiratory infection disease among under-five children in Ethiopia using ensemble machine learning and Shapley additive explanations (SHAP).","authors":"Zinabu Bekele Tadese, Debela Tsegaye Hailu, Aschale Wubete Abebe, Shimels Derso Kebede, Agmasie Damtew Walle, Beminate Lemma Seifu, Teshome Demis Nimani","doi":"10.1177/20552076241272739","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Although the prevalence of childhood illnesses has significantly decreased, acute respiratory infections continue to be the leading cause of death and disease among children in low- and middle-income countries. Seven percent of children under five experienced symptoms in the two weeks preceding the Ethiopian demographic and health survey. Hence, this study aimed to identify interpretable predicting factors of acute respiratory infection disease among under-five children in Ethiopia using machine learning analysis techniques.</p><p><strong>Methods: </strong>Secondary data analysis was performed using 2016 Ethiopian demographic and health survey data. Data were extracted using STATA and imported into Jupyter Notebook for further analysis. The presence of acute respiratory infection in a child under the age of 5 was the outcome variable, categorized as yes and no. Five ensemble boosting machine learning algorithms such as adaptive boosting (AdaBoost), extreme gradient boosting (XGBoost), Gradient Boost, CatBoost, and light gradient-boosting machine (LightGBM) were employed on a total sample of 10,641 children under the age of 5. The Shapley additive explanations technique was used to identify the important features and effects of each feature driving the prediction.</p><p><strong>Results: </strong><b>The</b> XGBoost model achieved an accuracy of 79.3%, an F1 score of 78.4%, a recall of 78.3%, a precision of 81.7%, and a receiver operating curve area under the curve of 86.1% after model optimization. Child age (month), history of diarrhea, number of living children, duration of breastfeeding, and mother's occupation were the top predicting factors of acute respiratory infection among children under the age of 5 in Ethiopia.</p><p><strong>Conclusion: </strong>The XGBoost classifier was the best predictive model with improved performance, and predicting factors of acute respiratory infection were identified with the help of the Shapely additive explanation. The findings of this study can help policymakers and stakeholders understand the decision-making process for acute respiratory infection prevention among under-five children in Ethiopia.</p>","PeriodicalId":51333,"journal":{"name":"DIGITAL HEALTH","volume":null,"pages":null},"PeriodicalIF":2.9000,"publicationDate":"2024-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11304488/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"DIGITAL HEALTH","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1177/20552076241272739","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}
引用次数: 0

Abstract

Background: Although the prevalence of childhood illnesses has significantly decreased, acute respiratory infections continue to be the leading cause of death and disease among children in low- and middle-income countries. Seven percent of children under five experienced symptoms in the two weeks preceding the Ethiopian demographic and health survey. Hence, this study aimed to identify interpretable predicting factors of acute respiratory infection disease among under-five children in Ethiopia using machine learning analysis techniques.

Methods: Secondary data analysis was performed using 2016 Ethiopian demographic and health survey data. Data were extracted using STATA and imported into Jupyter Notebook for further analysis. The presence of acute respiratory infection in a child under the age of 5 was the outcome variable, categorized as yes and no. Five ensemble boosting machine learning algorithms such as adaptive boosting (AdaBoost), extreme gradient boosting (XGBoost), Gradient Boost, CatBoost, and light gradient-boosting machine (LightGBM) were employed on a total sample of 10,641 children under the age of 5. The Shapley additive explanations technique was used to identify the important features and effects of each feature driving the prediction.

Results: The XGBoost model achieved an accuracy of 79.3%, an F1 score of 78.4%, a recall of 78.3%, a precision of 81.7%, and a receiver operating curve area under the curve of 86.1% after model optimization. Child age (month), history of diarrhea, number of living children, duration of breastfeeding, and mother's occupation were the top predicting factors of acute respiratory infection among children under the age of 5 in Ethiopia.

Conclusion: The XGBoost classifier was the best predictive model with improved performance, and predicting factors of acute respiratory infection were identified with the help of the Shapely additive explanation. The findings of this study can help policymakers and stakeholders understand the decision-making process for acute respiratory infection prevention among under-five children in Ethiopia.

利用集合机器学习和夏普利加法解释(SHAP)对埃塞俄比亚五岁以下儿童急性呼吸道感染疾病进行可解释的预测。
背景:虽然儿童疾病的发病率已大幅下降,但急性呼吸道感染仍是中低收入国家儿童死亡和疾病的主要原因。在埃塞俄比亚人口与健康调查之前的两周内,7% 的五岁以下儿童出现过症状。因此,本研究旨在利用机器学习分析技术找出埃塞俄比亚五岁以下儿童急性呼吸道感染疾病的可解释预测因素:使用 2016 年埃塞俄比亚人口与健康调查数据进行二次数据分析。使用 STATA 提取数据并导入 Jupyter Notebook 进行进一步分析。5 岁以下儿童是否患有急性呼吸道感染是结果变量,分为 "是 "和 "否"。在总共 10,641 个 5 岁以下儿童样本中采用了自适应提升(AdaBoost)、极梯度提升(XGBoost)、梯度提升(Gradient Boost)、CatBoost 和轻梯度提升机(LightGBM)等五种集合提升机器学习算法。结果显示,XGBoost 模型的预测率达到了 90%:经过模型优化后,XGBoost 模型的准确率为 79.3%,F1 得分为 78.4%,召回率为 78.3%,精确率为 81.7%,接收者工作曲线下面积为 86.1%。儿童年龄(月)、腹泻史、存活儿童数、母乳喂养时间和母亲职业是埃塞俄比亚 5 岁以下儿童急性呼吸道感染的首要预测因素:XGBoost分类器是性能更好的最佳预测模型,在Shapely加法解释的帮助下确定了急性呼吸道感染的预测因素。本研究的结果有助于决策者和利益相关者了解埃塞俄比亚五岁以下儿童预防急性呼吸道感染的决策过程。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
DIGITAL HEALTH
DIGITAL HEALTH Multiple-
CiteScore
2.90
自引率
7.70%
发文量
302
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信