Machine learning models for prediction of double and triple burdens of non-communicable diseases in Bangladesh

IF 1.5 3区 社会学 Q2 DEMOGRAPHY
Md. Akib Al-Zubayer, Khorshed Alam, Hasibul Hasan Shanto, Md. Maniruzzaman, Uttam Kumar Majumder, Benojir Ahammed
{"title":"Machine learning models for prediction of double and triple burdens of non-communicable diseases in Bangladesh","authors":"Md. Akib Al-Zubayer, Khorshed Alam, Hasibul Hasan Shanto, Md. Maniruzzaman, Uttam Kumar Majumder, Benojir Ahammed","doi":"10.1017/s0021932024000063","DOIUrl":null,"url":null,"abstract":"<p>Increasing prevalence of non-communicable diseases (NCDs) has become the leading cause of death and disability in Bangladesh. Therefore, this study aimed to measure the prevalence of and risk factors for double and triple burden of NCDs (DBNCDs and TBNCDs), considering diabetes, hypertension, and overweight and obesity as well as establish a machine learning approach for predicting DBNCDs and TBNCDs. A total of 12,151 respondents from the 2017 to 2018 Bangladesh Demographic and Health Survey were included in this analysis, where 10%, 27.4%, and 24.3% of respondents had diabetes, hypertension, and overweight and obesity, respectively. Chi-square test and multilevel logistic regression (LR) analysis were applied to select factors associated with DBNCDs and TBNCDs. Furthermore, six classifiers including decision tree (DT), LR, naïve Bayes (NB), k-nearest neighbour (KNN), random forest (RF), and extreme gradient boosting (XGBoost) with three cross-validation protocols (K2, K5, and K10) were adopted to predict the status of DBNCDs and TBNCDs. The classification accuracy (ACC) and area under the curve (AUC) were computed for each protocol and repeated 10 times to make them more robust, and then the average ACC and AUC were computed. The prevalence of DBNCDs and TBNCDs was 14.3% and 2.3%, respectively. The findings of this study revealed that DBNCDs and TBNCDs were significantly influenced by age, sex, marital status, wealth index, education and geographic region. Compared to other classifiers, the RF-based classifier provides the highest ACC and AUC for both DBNCDs (ACC = 81.06% and AUC = 0.93) and TBNCDs (ACC = 88.61% and AUC = 0.97) for the K10 protocol. A combination of considered two-step factor selections and RF-based classifier can better predict the burden of NCDs. The findings of this study suggested that decision-makers might adopt suitable decisions to control and prevent the burden of NCDs using RF classifiers.</p>","PeriodicalId":47742,"journal":{"name":"Journal of Biosocial Science","volume":null,"pages":null},"PeriodicalIF":1.5000,"publicationDate":"2024-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Biosocial Science","FirstCategoryId":"90","ListUrlMain":"https://doi.org/10.1017/s0021932024000063","RegionNum":3,"RegionCategory":"社会学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"DEMOGRAPHY","Score":null,"Total":0}
引用次数: 0

Abstract

Increasing prevalence of non-communicable diseases (NCDs) has become the leading cause of death and disability in Bangladesh. Therefore, this study aimed to measure the prevalence of and risk factors for double and triple burden of NCDs (DBNCDs and TBNCDs), considering diabetes, hypertension, and overweight and obesity as well as establish a machine learning approach for predicting DBNCDs and TBNCDs. A total of 12,151 respondents from the 2017 to 2018 Bangladesh Demographic and Health Survey were included in this analysis, where 10%, 27.4%, and 24.3% of respondents had diabetes, hypertension, and overweight and obesity, respectively. Chi-square test and multilevel logistic regression (LR) analysis were applied to select factors associated with DBNCDs and TBNCDs. Furthermore, six classifiers including decision tree (DT), LR, naïve Bayes (NB), k-nearest neighbour (KNN), random forest (RF), and extreme gradient boosting (XGBoost) with three cross-validation protocols (K2, K5, and K10) were adopted to predict the status of DBNCDs and TBNCDs. The classification accuracy (ACC) and area under the curve (AUC) were computed for each protocol and repeated 10 times to make them more robust, and then the average ACC and AUC were computed. The prevalence of DBNCDs and TBNCDs was 14.3% and 2.3%, respectively. The findings of this study revealed that DBNCDs and TBNCDs were significantly influenced by age, sex, marital status, wealth index, education and geographic region. Compared to other classifiers, the RF-based classifier provides the highest ACC and AUC for both DBNCDs (ACC = 81.06% and AUC = 0.93) and TBNCDs (ACC = 88.61% and AUC = 0.97) for the K10 protocol. A combination of considered two-step factor selections and RF-based classifier can better predict the burden of NCDs. The findings of this study suggested that decision-makers might adopt suitable decisions to control and prevent the burden of NCDs using RF classifiers.

预测孟加拉国非传染性疾病双重和三重负担的机器学习模型
非传染性疾病(NCDs)发病率的不断上升已成为孟加拉国死亡和残疾的主要原因。因此,本研究旨在测量非传染性疾病双重和三重负担(DBNCDs 和 TBNCDs)的患病率和风险因素,同时考虑糖尿病、高血压、超重和肥胖,并建立预测 DBNCDs 和 TBNCDs 的机器学习方法。本次分析共纳入了 2017 年至 2018 年孟加拉国人口与健康调查的 12151 名受访者,其中分别有 10%、27.4% 和 24.3% 的受访者患有糖尿病、高血压以及超重和肥胖症。应用卡方检验和多层次逻辑回归(LR)分析来选择与 DBNCD 和 TBNCD 相关的因素。此外,还采用了六种分类器,包括决策树(DT)、LR、天真贝叶斯(NB)、k-近邻(KNN)、随机森林(RF)和极端梯度提升(XGBoost),并采用三种交叉验证方案(K2、K5和K10)来预测DBNCDs和TBNCDs的状态。计算每个方案的分类准确率(ACC)和曲线下面积(AUC),并重复10次以提高其稳健性,然后计算平均ACC和AUC。DBNCD 和 TBNCD 的发病率分别为 14.3% 和 2.3%。研究结果显示,年龄、性别、婚姻状况、财富指数、教育程度和地理区域对 DBNCD 和 TBNCD 有显著影响。与其他分类器相比,基于射频的分类器为 K10 方案的 DBNCDs(ACC = 81.06%,AUC = 0.93)和 TBNCDs(ACC = 88.61%,AUC = 0.97)提供了最高的 ACC 和 AUC。综合考虑两步因素选择和基于射频的分类器可以更好地预测非传染性疾病的负担。这项研究的结果表明,决策者可以利用射频分类器做出适当的决策,以控制和预防非传染性疾病的负担。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
3.00
自引率
6.70%
发文量
108
期刊介绍: Journal of Biosocial Science is a leading interdisciplinary and international journal in the field of biosocial science, the common ground between biology and sociology. It acts as an essential reference guide for all biological and social scientists working in these interdisciplinary areas, including social and biological aspects of reproduction and its control, gerontology, ecology, genetics, applied psychology, sociology, education, criminology, demography, health and epidemiology. Publishing original research papers, short reports, reviews, lectures and book reviews, the journal also includes a Debate section that encourages readers" comments on specific articles, with subsequent response from the original author.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信