Comparative analysis of machine learning algorithms for predicting diarrhea among under-five children in Ethiopia: Evidence from 2016 EDHS

IF 2.2 3区医学 Q2 HEALTH CARE SCIENCES & SERVICES

Health Informatics Journal Pub Date : 2024-09-14 DOI:10.1177/14604582241285769

Alemu Birara Zemariam, Wondosen Abey, Abdulaziz Kebede Kassaw, Ali Yimer

{"title":"Comparative analysis of machine learning algorithms for predicting diarrhea among under-five children in Ethiopia: Evidence from 2016 EDHS","authors":"Alemu Birara Zemariam, Wondosen Abey, Abdulaziz Kebede Kassaw, Ali Yimer","doi":"10.1177/14604582241285769","DOIUrl":null,"url":null,"abstract":"Background: Diarrhea is a major cause of mortality and morbidity in under-5 children globally, especially in developing countries like Ethiopia. Limited research has used machine learning to predict childhood diarrhea. This study aimed to compare the predictive performance of ML algorithms for diarrhea in under-5 children in Ethiopia. Methods: The study utilized a dataset of 9501 under-5 children from the Ethiopia Demographic and Health Survey 2016. Five ML algorithms were used to build and compare predictive models. The model performance was evaluated using various metrics in Python. Boruta feature selection was employed, and data balancing techniques such as under-sampling, over-sampling, adaptive synthetic sampling, and synthetic minority oversampling as well as hyper parameter tuning methods were explored. Association rule mining was conducted using the Apriori algorithm in R to determine relationships between independent and target variables. Results: 10.2% of children had diarrhea. The Random Forest model had the best performance with 93.2% accuracy, 98.4% sensitivity, 85.5% specificity, and 0.916 AUC. The top predictors were residence, wealth index, and child age, number of living children, deworming, wasting, mother’s occupation, and education. Association rule mining identified the top 7 rules most associated with under-5 diarrhea in Ethiopia. Conclusion: The RF achieved the highest performance for predicting childhood diarrhea. Policymakers and healthcare providers can use these findings to develop targeted interventions to reduce diarrhea. Customizing strategies based on the identified association rules has the potential to improve child health and decrease the impact of diarrhea in Ethiopia.","PeriodicalId":55069,"journal":{"name":"Health Informatics Journal","volume":"23 1","pages":""},"PeriodicalIF":2.2000,"publicationDate":"2024-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Health Informatics Journal","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1177/14604582241285769","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}

引用次数: 0

Abstract

Background: Diarrhea is a major cause of mortality and morbidity in under-5 children globally, especially in developing countries like Ethiopia. Limited research has used machine learning to predict childhood diarrhea. This study aimed to compare the predictive performance of ML algorithms for diarrhea in under-5 children in Ethiopia. Methods: The study utilized a dataset of 9501 under-5 children from the Ethiopia Demographic and Health Survey 2016. Five ML algorithms were used to build and compare predictive models. The model performance was evaluated using various metrics in Python. Boruta feature selection was employed, and data balancing techniques such as under-sampling, over-sampling, adaptive synthetic sampling, and synthetic minority oversampling as well as hyper parameter tuning methods were explored. Association rule mining was conducted using the Apriori algorithm in R to determine relationships between independent and target variables. Results: 10.2% of children had diarrhea. The Random Forest model had the best performance with 93.2% accuracy, 98.4% sensitivity, 85.5% specificity, and 0.916 AUC. The top predictors were residence, wealth index, and child age, number of living children, deworming, wasting, mother’s occupation, and education. Association rule mining identified the top 7 rules most associated with under-5 diarrhea in Ethiopia. Conclusion: The RF achieved the highest performance for predicting childhood diarrhea. Policymakers and healthcare providers can use these findings to develop targeted interventions to reduce diarrhea. Customizing strategies based on the identified association rules has the potential to improve child health and decrease the impact of diarrhea in Ethiopia.

查看原文本刊更多论文

预测埃塞俄比亚五岁以下儿童腹泻的机器学习算法比较分析：来自 2016 年埃塞俄比亚人口与健康调查的证据

背景：腹泻是全球 5 岁以下儿童死亡和发病的主要原因，尤其是在埃塞俄比亚等发展中国家。利用机器学习预测儿童腹泻的研究有限。本研究旨在比较机器学习算法对埃塞俄比亚 5 岁以下儿童腹泻的预测性能。方法：研究利用了 2016 年埃塞俄比亚人口与健康调查中 9501 名 5 岁以下儿童的数据集。使用五种 ML 算法建立并比较预测模型。使用 Python 中的各种指标对模型性能进行了评估。采用了 Boruta 特征选择，并探索了数据平衡技术，例如欠采样、过度采样、自适应合成采样和合成少数过度采样以及超参数调整方法。使用 R 中的 Apriori 算法进行了关联规则挖掘，以确定自变量和目标变量之间的关系。结果10.2%的儿童患有腹泻。随机森林模型的准确率为 93.2%，灵敏度为 98.4%，特异性为 85.5%，AUC 为 0.916，表现最佳。最主要的预测因素是居住地、财富指数、儿童年龄、存活儿童数量、驱虫、消瘦、母亲职业和教育程度。关联规则挖掘确定了与埃塞俄比亚 5 岁以下儿童腹泻最相关的 7 条规则。结论：RF 在预测儿童腹泻方面的性能最高。政策制定者和医疗保健提供者可以利用这些发现制定有针对性的干预措施，以减少腹泻。根据已确定的关联规则定制策略，有可能改善埃塞俄比亚的儿童健康状况并减少腹泻的影响。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Health Informatics Journal HEALTH CARE SCIENCES & SERVICES-MEDICAL INFORMATICS

CiteScore

7.80

自引率

6.70%

发文量

审稿时长

6 months

期刊介绍： Health Informatics Journal is an international peer-reviewed journal. All papers submitted to Health Informatics Journal are subject to peer review by members of a carefully appointed editorial board. The journal operates a conventional single-blind reviewing policy in which the reviewer’s name is always concealed from the submitting author.