用于预测东非五价3疫苗接种失学率的堆叠集成机器学习模型

IF 2.4 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS
Frontiers in Big Data Pub Date : 2025-04-07 eCollection Date: 2025-01-01 DOI:10.3389/fdata.2025.1522578
Meron Asmamaw Alemayehu, Shimels Derso Kebede, Agmasie Damtew Walle, Daniel Niguse Mamo, Ermias Bekele Enyew, Jibril Bashir Adem
{"title":"用于预测东非五价3疫苗接种失学率的堆叠集成机器学习模型","authors":"Meron Asmamaw Alemayehu, Shimels Derso Kebede, Agmasie Damtew Walle, Daniel Niguse Mamo, Ermias Bekele Enyew, Jibril Bashir Adem","doi":"10.3389/fdata.2025.1522578","DOIUrl":null,"url":null,"abstract":"<p><strong>Introduction: </strong>Vaccination is critical for reducing childhood mortality, yet completion rates for the third dose of the pentavalent vaccine (Penta 3) in East Africa remain inadequate. This study aims to predict Penta 3 vaccination dropout using a stacking ensemble machine learning model with Demographic and Health Survey (DHS) data. The objective is to identify predictors of dropout and enhance intervention strategies.</p><p><strong>Methods: </strong>The study utilized seven base machine learning algorithms to create a stacked ensemble model with three meta-learners: Random Forest (RF), Generalized Linear Model (GLM), and Extreme Gradient Boosting (XGBoost). The H2O package facilitated the development of base learners and the stacking of super learners. Feature selection (FS) and comparisons were performed using the LASSO and Boruta algorithms. The selected features were one-hot encoded, and ordinal encoding was applied where appropriate. Hyperparameter optimization (HPO) and comparisons were conducted using grid search and random search. Model performance was assessed using five key metrics, including accuracy and the area under the curve (AUC). SHAP (Shapley Additive Explanations) values were used to interpret the model outputs and identify influential predictors. The experimental design was employed to present the results.</p><p><strong>Results: </strong>Four experiments were conducted to evaluate feature selection and HPO methods. All stacked ensemble models outperformed individual learners, with the XGBoost meta-learner optimized with grid search and LASSO FS achieving the highest performance: 93.9% accuracy and 99.4% AUC. While RF and GLM meta-learners were also evaluated, they were outperformed by the XGBoost meta-learner. SHAP analysis revealed key features influencing Penta 3 dropout, including the place of delivery, decision-making autonomy, the mother's level of earning, and healthcare access. Home delivery increased the risk of dropout, while postnatal care by midwives and health insurance coverage lowered dropout likelihood.</p><p><strong>Conclusion and recommendation: </strong>This study provides insights into the factors influencing Penta 3 vaccination dropout in East Africa. To reduce dropout rates, interventions should focus on enhancing maternal livelihood opportunities, improving healthcare access in rural areas, and promoting institutional deliveries.</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"8 ","pages":"1522578"},"PeriodicalIF":2.4000,"publicationDate":"2025-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12009798/pdf/","citationCount":"0","resultStr":"{\"title\":\"A stacked ensemble machine learning model for the prediction of pentavalent 3 vaccination dropout in East Africa.\",\"authors\":\"Meron Asmamaw Alemayehu, Shimels Derso Kebede, Agmasie Damtew Walle, Daniel Niguse Mamo, Ermias Bekele Enyew, Jibril Bashir Adem\",\"doi\":\"10.3389/fdata.2025.1522578\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Introduction: </strong>Vaccination is critical for reducing childhood mortality, yet completion rates for the third dose of the pentavalent vaccine (Penta 3) in East Africa remain inadequate. This study aims to predict Penta 3 vaccination dropout using a stacking ensemble machine learning model with Demographic and Health Survey (DHS) data. The objective is to identify predictors of dropout and enhance intervention strategies.</p><p><strong>Methods: </strong>The study utilized seven base machine learning algorithms to create a stacked ensemble model with three meta-learners: Random Forest (RF), Generalized Linear Model (GLM), and Extreme Gradient Boosting (XGBoost). The H2O package facilitated the development of base learners and the stacking of super learners. Feature selection (FS) and comparisons were performed using the LASSO and Boruta algorithms. The selected features were one-hot encoded, and ordinal encoding was applied where appropriate. Hyperparameter optimization (HPO) and comparisons were conducted using grid search and random search. Model performance was assessed using five key metrics, including accuracy and the area under the curve (AUC). SHAP (Shapley Additive Explanations) values were used to interpret the model outputs and identify influential predictors. The experimental design was employed to present the results.</p><p><strong>Results: </strong>Four experiments were conducted to evaluate feature selection and HPO methods. All stacked ensemble models outperformed individual learners, with the XGBoost meta-learner optimized with grid search and LASSO FS achieving the highest performance: 93.9% accuracy and 99.4% AUC. While RF and GLM meta-learners were also evaluated, they were outperformed by the XGBoost meta-learner. SHAP analysis revealed key features influencing Penta 3 dropout, including the place of delivery, decision-making autonomy, the mother's level of earning, and healthcare access. Home delivery increased the risk of dropout, while postnatal care by midwives and health insurance coverage lowered dropout likelihood.</p><p><strong>Conclusion and recommendation: </strong>This study provides insights into the factors influencing Penta 3 vaccination dropout in East Africa. To reduce dropout rates, interventions should focus on enhancing maternal livelihood opportunities, improving healthcare access in rural areas, and promoting institutional deliveries.</p>\",\"PeriodicalId\":52859,\"journal\":{\"name\":\"Frontiers in Big Data\",\"volume\":\"8 \",\"pages\":\"1522578\"},\"PeriodicalIF\":2.4000,\"publicationDate\":\"2025-04-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12009798/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Frontiers in Big Data\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.3389/fdata.2025.1522578\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/1/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in Big Data","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3389/fdata.2025.1522578","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

摘要

疫苗接种对于降低儿童死亡率至关重要,但东非第三剂五价疫苗(Penta 3)的完成率仍然不足。本研究旨在使用人口与健康调查(DHS)数据的堆叠集成机器学习模型预测Penta 3疫苗辍学率。目的是确定辍学的预测因素并加强干预策略。方法:利用7种基本机器学习算法,利用随机森林(Random Forest, RF)、广义线性模型(Generalized Linear model, GLM)和极限梯度增强(Extreme Gradient Boosting, XGBoost)这3种元学习器,创建一个堆叠集成模型。H2O包促进了基础学习器的开发和超级学习器的堆叠。使用LASSO和Boruta算法进行特征选择(FS)和比较。选择的特征是单热编码,并在适当的地方应用顺序编码。采用网格搜索和随机搜索进行超参数优化(HPO)和比较。使用五个关键指标评估模型性能,包括准确性和曲线下面积(AUC)。SHAP (Shapley Additive explanation)值用于解释模型输出并确定有影响的预测因子。采用实验设计来展示结果。结果:通过4个实验对特征选择和HPO方法进行了评价。所有堆叠集成模型的表现都优于单个学习器,其中使用网格搜索和LASSO FS优化的XGBoost元学习器达到了最高的性能:准确率为93.9%,AUC为99.4%。虽然RF和GLM元学习器也被评估,但它们的表现优于XGBoost元学习器。SHAP分析揭示了影响Penta 3辍学的关键特征,包括分娩地点、决策自主权、母亲的收入水平和医疗保健可及性。在家分娩增加了辍学的风险,而助产士的产后护理和医疗保险则降低了辍学的可能性。结论和建议:本研究提供了影响东非三期疫苗接种失学率因素的见解。为了降低辍学率,干预措施应侧重于增加孕产妇生计机会,改善农村地区的医疗保健服务,并促进机构分娩。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

A stacked ensemble machine learning model for the prediction of pentavalent 3 vaccination dropout in East Africa.

A stacked ensemble machine learning model for the prediction of pentavalent 3 vaccination dropout in East Africa.

A stacked ensemble machine learning model for the prediction of pentavalent 3 vaccination dropout in East Africa.

A stacked ensemble machine learning model for the prediction of pentavalent 3 vaccination dropout in East Africa.

Introduction: Vaccination is critical for reducing childhood mortality, yet completion rates for the third dose of the pentavalent vaccine (Penta 3) in East Africa remain inadequate. This study aims to predict Penta 3 vaccination dropout using a stacking ensemble machine learning model with Demographic and Health Survey (DHS) data. The objective is to identify predictors of dropout and enhance intervention strategies.

Methods: The study utilized seven base machine learning algorithms to create a stacked ensemble model with three meta-learners: Random Forest (RF), Generalized Linear Model (GLM), and Extreme Gradient Boosting (XGBoost). The H2O package facilitated the development of base learners and the stacking of super learners. Feature selection (FS) and comparisons were performed using the LASSO and Boruta algorithms. The selected features were one-hot encoded, and ordinal encoding was applied where appropriate. Hyperparameter optimization (HPO) and comparisons were conducted using grid search and random search. Model performance was assessed using five key metrics, including accuracy and the area under the curve (AUC). SHAP (Shapley Additive Explanations) values were used to interpret the model outputs and identify influential predictors. The experimental design was employed to present the results.

Results: Four experiments were conducted to evaluate feature selection and HPO methods. All stacked ensemble models outperformed individual learners, with the XGBoost meta-learner optimized with grid search and LASSO FS achieving the highest performance: 93.9% accuracy and 99.4% AUC. While RF and GLM meta-learners were also evaluated, they were outperformed by the XGBoost meta-learner. SHAP analysis revealed key features influencing Penta 3 dropout, including the place of delivery, decision-making autonomy, the mother's level of earning, and healthcare access. Home delivery increased the risk of dropout, while postnatal care by midwives and health insurance coverage lowered dropout likelihood.

Conclusion and recommendation: This study provides insights into the factors influencing Penta 3 vaccination dropout in East Africa. To reduce dropout rates, interventions should focus on enhancing maternal livelihood opportunities, improving healthcare access in rural areas, and promoting institutional deliveries.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
5.20
自引率
3.20%
发文量
122
审稿时长
13 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信