Identification of factors associated with acute malnutrition in children under 5 years and forecasting future prevalence: assessing the potential of statistical and machine learning methods.
Meike Reusken, Christopher Coffey, Frans Cruijssen, Bertrand Melenberg, Cascha van Wanrooij
{"title":"Identification of factors associated with acute malnutrition in children under 5 years and forecasting future prevalence: assessing the potential of statistical and machine learning methods.","authors":"Meike Reusken, Christopher Coffey, Frans Cruijssen, Bertrand Melenberg, Cascha van Wanrooij","doi":"10.1136/bmjph-2024-001460","DOIUrl":null,"url":null,"abstract":"<p><strong>Introduction: </strong>Eliminating acute malnutrition in children under 5 years of age stands as a critical health priority outlined in the United Nations Sustainable Development Goal 2, 'Zero Hunger'. This requires targeted provision of treatment and preventative services. However, accurately forecasting future prevalence of cases remains challenging, with the application of predictive models being notably scarce. Addressing this gap, this paper aims to identify factors associated with Global Acute Malnutrition (GAM) and explores the potential of machine learning in predicting its prevalence using data from Somalia.</p><p><strong>Methods: </strong>Survey data on GAM prevalence systematically collected in Somalia every 6 months at a district level from 2017 to 2021 were collated alongside a range of potential climatic, demographic, disease, environmental, conflict and food security-related factors over a matching time period. We conducted both simple and multiple, parametric and non-parametric statistical analyses to identify factors associated with GAM to be used as input in forecasting future GAM prevalence. We then applied tree-based machine learning algorithms to a dataset comprising the GAM prevalence estimates and associated factors to try to forecast the trajectory and fluctuations in GAM prevalence 6 months into the future.</p><p><strong>Results: </strong>We found factors statistically associated with GAM prevalence relating to rainfall, land vegetation quality, food security status, crop production and demographics. The majority of these associations were nonlinear, motivating the use of tree-based machine learning-based forecasts. Among the forecasting methods tested, random forest machine learning proves to be the most effective and was found to accurately forecast the direction of GAM prevalence in test data for many of the districts in Somalia.</p>","PeriodicalId":101362,"journal":{"name":"BMJ public health","volume":"3 1","pages":"e001460"},"PeriodicalIF":0.0000,"publicationDate":"2025-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11883882/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMJ public health","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1136/bmjph-2024-001460","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Introduction: Eliminating acute malnutrition in children under 5 years of age stands as a critical health priority outlined in the United Nations Sustainable Development Goal 2, 'Zero Hunger'. This requires targeted provision of treatment and preventative services. However, accurately forecasting future prevalence of cases remains challenging, with the application of predictive models being notably scarce. Addressing this gap, this paper aims to identify factors associated with Global Acute Malnutrition (GAM) and explores the potential of machine learning in predicting its prevalence using data from Somalia.
Methods: Survey data on GAM prevalence systematically collected in Somalia every 6 months at a district level from 2017 to 2021 were collated alongside a range of potential climatic, demographic, disease, environmental, conflict and food security-related factors over a matching time period. We conducted both simple and multiple, parametric and non-parametric statistical analyses to identify factors associated with GAM to be used as input in forecasting future GAM prevalence. We then applied tree-based machine learning algorithms to a dataset comprising the GAM prevalence estimates and associated factors to try to forecast the trajectory and fluctuations in GAM prevalence 6 months into the future.
Results: We found factors statistically associated with GAM prevalence relating to rainfall, land vegetation quality, food security status, crop production and demographics. The majority of these associations were nonlinear, motivating the use of tree-based machine learning-based forecasts. Among the forecasting methods tested, random forest machine learning proves to be the most effective and was found to accurately forecast the direction of GAM prevalence in test data for many of the districts in Somalia.