Shahid Mohammad Ganie , Bobba Bharath Reddy , Hemachandran K , Manjeet Rege
{"title":"An investigation of ensemble learning techniques for obesity risk prediction using lifestyle data","authors":"Shahid Mohammad Ganie , Bobba Bharath Reddy , Hemachandran K , Manjeet Rege","doi":"10.1016/j.dajour.2024.100539","DOIUrl":null,"url":null,"abstract":"<div><div>Obesity disease is a significant health issue and has affected millions of people worldwide. Identifying underlying reasons for the onset of obesity risk in its early stage has become challenging for medical practitioners. The growing volume of lifestyle data related to obesity has made it imperative to employ effective machine-learning algorithms that can gather insights from the underlying data trends and identify critical patient conditions. In this study, an ensemble learning approach including boosting, bagging, and voting techniques was used for obesity risk prediction based on lifestyle dataset. Specifically, XGBoost, Gradient Boosting, and CatBoost models are used for boosting, Bagged Decision Tree, Random Forest, and Extra Tree models are used for bagging, and Logistic Regression, Decision Tree, and Support Vector Machine models are used for voting. Different preprocessing steps were employed to improve the quality assessment of the data. Hyperparameter tuning and feature selection and ranking are also used to achieve better prediction results. The considered models are extensively evaluated and compared using various metrics. Among all the models, XGBoost performed better with an accuracy of 98.10%, precision and recall of 97.50%, f1-score of 96.50%, and AUC-ROC of 100%, respectively. Additionally, weight, height, and age features were identified and ranked as the most significant predictors using the recursive feature elimination method for obesity risk prediction. Our proposed model can be used in the healthcare industry to support healthcare providers in better predicting and detecting multiple stages of obesity diseases.</div></div>","PeriodicalId":100357,"journal":{"name":"Decision Analytics Journal","volume":"14 ","pages":"Article 100539"},"PeriodicalIF":0.0000,"publicationDate":"2024-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Decision Analytics Journal","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2772662224001437","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Obesity disease is a significant health issue and has affected millions of people worldwide. Identifying underlying reasons for the onset of obesity risk in its early stage has become challenging for medical practitioners. The growing volume of lifestyle data related to obesity has made it imperative to employ effective machine-learning algorithms that can gather insights from the underlying data trends and identify critical patient conditions. In this study, an ensemble learning approach including boosting, bagging, and voting techniques was used for obesity risk prediction based on lifestyle dataset. Specifically, XGBoost, Gradient Boosting, and CatBoost models are used for boosting, Bagged Decision Tree, Random Forest, and Extra Tree models are used for bagging, and Logistic Regression, Decision Tree, and Support Vector Machine models are used for voting. Different preprocessing steps were employed to improve the quality assessment of the data. Hyperparameter tuning and feature selection and ranking are also used to achieve better prediction results. The considered models are extensively evaluated and compared using various metrics. Among all the models, XGBoost performed better with an accuracy of 98.10%, precision and recall of 97.50%, f1-score of 96.50%, and AUC-ROC of 100%, respectively. Additionally, weight, height, and age features were identified and ranked as the most significant predictors using the recursive feature elimination method for obesity risk prediction. Our proposed model can be used in the healthcare industry to support healthcare providers in better predicting and detecting multiple stages of obesity diseases.