Optimizing machine learning models for predicting anemia among under-five children in Ethiopia: insights from Ethiopian demographic and health survey data.
Ali Yimer, Hassen Ahmed Yesuf, Sada Ahmed, Alemu Birara Zemariam, Endris Mussa, Nurye Sirage, Adem Yesuf, Abdulaziz Kebede Kassaw
{"title":"Optimizing machine learning models for predicting anemia among under-five children in Ethiopia: insights from Ethiopian demographic and health survey data.","authors":"Ali Yimer, Hassen Ahmed Yesuf, Sada Ahmed, Alemu Birara Zemariam, Endris Mussa, Nurye Sirage, Adem Yesuf, Abdulaziz Kebede Kassaw","doi":"10.1186/s12887-025-05659-9","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Healthcare practitioners require a robust predictive system to accurately diagnose diseases, especially in young children with conditions such as anemia. Delays in diagnosis and treatment can have severe consequences, potentially leading to serious complications and childhood mortality. By leveraging machine learning methods with extensive datasets, valuable and scientifically sound insights can be generated to address pressing health and healthcare-related challenges.</p><p><strong>Objectives: </strong>The primary objective of this study was to identify the most effective machine-learning algorithm for predicting anemia among under five children in Ethiopia.</p><p><strong>Methods: </strong>The data utilized in this study were sourced from the 2016 Ethiopian Demographic and Health Survey. Six machine-learning models, comprising a classic logistic regression model along with random forest, decision tree, support vector machine, Naïve Bayes, and K-nearest neighbors, were employed to predict factors influencing anemia in children under five. The predictive capacities of each machine-learning model were evaluated using receiver operating characteristic curves and various measures of model accuracy.</p><p><strong>Results: </strong>The random forest model demonstrated the highest accuracy among the algorithms tested, achieving an overall accuracy of 81.16%. The accuracy rates for the decision tree, support vector machines, Naïve Bayes, K-nearest neighbors, and classical logistic regression models were 68.40%, 59.94%, 53.06%, 69.96%, and 54.79%, respectively.</p><p><strong>Conclusion: </strong>In general, the random forest algorithm emerged as the preferred model for predicting anemia in children under five. The model exhibited a specificity of 79.26%, sensitivity of 83.07%, positive predictive value of 80.02%, negative predictive value of 82.40%, and an area under the curve of 81.80%.</p>","PeriodicalId":9144,"journal":{"name":"BMC Pediatrics","volume":"25 1","pages":"311"},"PeriodicalIF":2.0000,"publicationDate":"2025-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12013019/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Pediatrics","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s12887-025-05659-9","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"PEDIATRICS","Score":null,"Total":0}
引用次数: 0
Abstract
Background: Healthcare practitioners require a robust predictive system to accurately diagnose diseases, especially in young children with conditions such as anemia. Delays in diagnosis and treatment can have severe consequences, potentially leading to serious complications and childhood mortality. By leveraging machine learning methods with extensive datasets, valuable and scientifically sound insights can be generated to address pressing health and healthcare-related challenges.
Objectives: The primary objective of this study was to identify the most effective machine-learning algorithm for predicting anemia among under five children in Ethiopia.
Methods: The data utilized in this study were sourced from the 2016 Ethiopian Demographic and Health Survey. Six machine-learning models, comprising a classic logistic regression model along with random forest, decision tree, support vector machine, Naïve Bayes, and K-nearest neighbors, were employed to predict factors influencing anemia in children under five. The predictive capacities of each machine-learning model were evaluated using receiver operating characteristic curves and various measures of model accuracy.
Results: The random forest model demonstrated the highest accuracy among the algorithms tested, achieving an overall accuracy of 81.16%. The accuracy rates for the decision tree, support vector machines, Naïve Bayes, K-nearest neighbors, and classical logistic regression models were 68.40%, 59.94%, 53.06%, 69.96%, and 54.79%, respectively.
Conclusion: In general, the random forest algorithm emerged as the preferred model for predicting anemia in children under five. The model exhibited a specificity of 79.26%, sensitivity of 83.07%, positive predictive value of 80.02%, negative predictive value of 82.40%, and an area under the curve of 81.80%.
期刊介绍:
BMC Pediatrics is an open access journal publishing peer-reviewed research articles in all aspects of health care in neonates, children and adolescents, as well as related molecular genetics, pathophysiology, and epidemiology.