使用smotenn、SMOTE和SMOTETomek类不平衡方法预测腹泻的集成机器学习分类器的实现

Elliot Mbunge, M. Sibiya, Sam Takavarasha, R. Millham, Garikayi B. Chemhaka, Benhildah Muchemwa, T. Dzinamarira
{"title":"使用smotenn、SMOTE和SMOTETomek类不平衡方法预测腹泻的集成机器学习分类器的实现","authors":"Elliot Mbunge, M. Sibiya, Sam Takavarasha, R. Millham, Garikayi B. Chemhaka, Benhildah Muchemwa, T. Dzinamarira","doi":"10.1109/ICTAS56421.2023.10082744","DOIUrl":null,"url":null,"abstract":"Diarrhoea continues to be a major public health burden and cause of death among children under 5 years in many developing countries. Rotavirus vaccination, hygiene practices, clean water, and health promotion are among the preventive measures implemented to improve child health. Nevertheless, tackling diarrhoea also requires the integration of ensemble machine learning (ML) into health systems to improve child health. However, the integration of ensemble classifiers into health systems in many developing countries is still nascent. Therefore, this study applied SMOTE, SMOTEEN and SMOTETomek class imbalance approaches and ensemble ML classifiers to predict diarrhoea. Ensemble methods significantly improve the performance of conventional ML classifiers. The study revealed that the ExtraTrees classifier achieved a high recall of 96.3%, accuracy of 94.3%, precision of 93.8%, and F1-score of 95% when predicting diarrhoea with SMOTEENN as compared to SMOTE and SMOTETomek. The performance of the HistGradientBoosting classifier also improved and achieved a high recall of 95.2%, accuracy of 91.5%, precision of 90.4%, and F1-score of 92.7%. The paper also shows that ensemble methods are increasingly becoming state-of-the-art solutions for multiple challenges encountered with ML algorithms such as overfitting, computationally intensive, underfitting and representation. The paper also demonstrates how ensemble methods are becoming state-of-the-art solutions to multiple problems that arise with ML algorithms. There is a need to develop data-driven applications that incorporate ensemble methods to model and predict diarrhoea to assist policymakers to craft interventions aimed to improve child health.","PeriodicalId":158720,"journal":{"name":"2023 Conference on Information Communications Technology and Society (ICTAS)","volume":"99 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Implementation of ensemble machine learning classifiers to predict diarrhoea with SMOTEENN, SMOTE, and SMOTETomek class imbalance approaches\",\"authors\":\"Elliot Mbunge, M. Sibiya, Sam Takavarasha, R. Millham, Garikayi B. Chemhaka, Benhildah Muchemwa, T. Dzinamarira\",\"doi\":\"10.1109/ICTAS56421.2023.10082744\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Diarrhoea continues to be a major public health burden and cause of death among children under 5 years in many developing countries. Rotavirus vaccination, hygiene practices, clean water, and health promotion are among the preventive measures implemented to improve child health. Nevertheless, tackling diarrhoea also requires the integration of ensemble machine learning (ML) into health systems to improve child health. However, the integration of ensemble classifiers into health systems in many developing countries is still nascent. Therefore, this study applied SMOTE, SMOTEEN and SMOTETomek class imbalance approaches and ensemble ML classifiers to predict diarrhoea. Ensemble methods significantly improve the performance of conventional ML classifiers. The study revealed that the ExtraTrees classifier achieved a high recall of 96.3%, accuracy of 94.3%, precision of 93.8%, and F1-score of 95% when predicting diarrhoea with SMOTEENN as compared to SMOTE and SMOTETomek. The performance of the HistGradientBoosting classifier also improved and achieved a high recall of 95.2%, accuracy of 91.5%, precision of 90.4%, and F1-score of 92.7%. The paper also shows that ensemble methods are increasingly becoming state-of-the-art solutions for multiple challenges encountered with ML algorithms such as overfitting, computationally intensive, underfitting and representation. The paper also demonstrates how ensemble methods are becoming state-of-the-art solutions to multiple problems that arise with ML algorithms. There is a need to develop data-driven applications that incorporate ensemble methods to model and predict diarrhoea to assist policymakers to craft interventions aimed to improve child health.\",\"PeriodicalId\":158720,\"journal\":{\"name\":\"2023 Conference on Information Communications Technology and Society (ICTAS)\",\"volume\":\"99 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-03-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2023 Conference on Information Communications Technology and Society (ICTAS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICTAS56421.2023.10082744\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 Conference on Information Communications Technology and Society (ICTAS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICTAS56421.2023.10082744","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

摘要

在许多发展中国家,腹泻仍然是一个主要的公共卫生负担和5岁以下儿童死亡的原因。轮状病毒疫苗接种、卫生习惯、清洁水和促进健康是为改善儿童健康而实施的预防措施。然而,解决腹泻问题还需要将集成机器学习(ML)整合到卫生系统中,以改善儿童健康。然而,在许多发展中国家,将综合分类器纳入卫生系统仍处于初级阶段。因此,本研究应用SMOTE、SMOTEEN和SMOTETomek类不平衡方法和集合ML分类器来预测腹泻。集成方法显著提高了传统ML分类器的性能。研究表明,与SMOTE和SMOTETomek相比,ExtraTrees分类器在预测SMOTEENN腹泻时的召回率为96.3%,准确率为94.3%,精密度为93.8%,f1评分为95%。HistGradientBoosting分类器的性能也得到了提高,召回率为95.2%,准确率为91.5%,精密度为90.4%,f1得分为92.7%。本文还表明,集成方法正日益成为ML算法遇到的多种挑战(如过拟合、计算密集型、欠拟合和表示)的最先进解决方案。本文还演示了集成方法如何成为ML算法出现的多个问题的最先进解决方案。有必要开发数据驱动的应用程序,采用综合方法对腹泻进行建模和预测,以协助决策者制定旨在改善儿童健康的干预措施。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Implementation of ensemble machine learning classifiers to predict diarrhoea with SMOTEENN, SMOTE, and SMOTETomek class imbalance approaches
Diarrhoea continues to be a major public health burden and cause of death among children under 5 years in many developing countries. Rotavirus vaccination, hygiene practices, clean water, and health promotion are among the preventive measures implemented to improve child health. Nevertheless, tackling diarrhoea also requires the integration of ensemble machine learning (ML) into health systems to improve child health. However, the integration of ensemble classifiers into health systems in many developing countries is still nascent. Therefore, this study applied SMOTE, SMOTEEN and SMOTETomek class imbalance approaches and ensemble ML classifiers to predict diarrhoea. Ensemble methods significantly improve the performance of conventional ML classifiers. The study revealed that the ExtraTrees classifier achieved a high recall of 96.3%, accuracy of 94.3%, precision of 93.8%, and F1-score of 95% when predicting diarrhoea with SMOTEENN as compared to SMOTE and SMOTETomek. The performance of the HistGradientBoosting classifier also improved and achieved a high recall of 95.2%, accuracy of 91.5%, precision of 90.4%, and F1-score of 92.7%. The paper also shows that ensemble methods are increasingly becoming state-of-the-art solutions for multiple challenges encountered with ML algorithms such as overfitting, computationally intensive, underfitting and representation. The paper also demonstrates how ensemble methods are becoming state-of-the-art solutions to multiple problems that arise with ML algorithms. There is a need to develop data-driven applications that incorporate ensemble methods to model and predict diarrhoea to assist policymakers to craft interventions aimed to improve child health.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信