使用生活方式数据进行肥胖风险预测的集成学习技术研究

Decision Analytics Journal Pub Date : 2024-12-30 DOI:10.1016/j.dajour.2024.100539

Shahid Mohammad Ganie , Bobba Bharath Reddy , Hemachandran K , Manjeet Rege

{"title":"使用生活方式数据进行肥胖风险预测的集成学习技术研究","authors":"Shahid Mohammad Ganie , Bobba Bharath Reddy , Hemachandran K , Manjeet Rege","doi":"10.1016/j.dajour.2024.100539","DOIUrl":null,"url":null,"abstract":"<div><div>Obesity disease is a significant health issue and has affected millions of people worldwide. Identifying underlying reasons for the onset of obesity risk in its early stage has become challenging for medical practitioners. The growing volume of lifestyle data related to obesity has made it imperative to employ effective machine-learning algorithms that can gather insights from the underlying data trends and identify critical patient conditions. In this study, an ensemble learning approach including boosting, bagging, and voting techniques was used for obesity risk prediction based on lifestyle dataset. Specifically, XGBoost, Gradient Boosting, and CatBoost models are used for boosting, Bagged Decision Tree, Random Forest, and Extra Tree models are used for bagging, and Logistic Regression, Decision Tree, and Support Vector Machine models are used for voting. Different preprocessing steps were employed to improve the quality assessment of the data. Hyperparameter tuning and feature selection and ranking are also used to achieve better prediction results. The considered models are extensively evaluated and compared using various metrics. Among all the models, XGBoost performed better with an accuracy of 98.10%, precision and recall of 97.50%, f1-score of 96.50%, and AUC-ROC of 100%, respectively. Additionally, weight, height, and age features were identified and ranked as the most significant predictors using the recursive feature elimination method for obesity risk prediction. Our proposed model can be used in the healthcare industry to support healthcare providers in better predicting and detecting multiple stages of obesity diseases.</div></div>","PeriodicalId":100357,"journal":{"name":"Decision Analytics Journal","volume":"14 ","pages":"Article 100539"},"PeriodicalIF":0.0000,"publicationDate":"2024-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"An investigation of ensemble learning techniques for obesity risk prediction using lifestyle data\",\"authors\":\"Shahid Mohammad Ganie , Bobba Bharath Reddy , Hemachandran K , Manjeet Rege\",\"doi\":\"10.1016/j.dajour.2024.100539\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Obesity disease is a significant health issue and has affected millions of people worldwide. Identifying underlying reasons for the onset of obesity risk in its early stage has become challenging for medical practitioners. The growing volume of lifestyle data related to obesity has made it imperative to employ effective machine-learning algorithms that can gather insights from the underlying data trends and identify critical patient conditions. In this study, an ensemble learning approach including boosting, bagging, and voting techniques was used for obesity risk prediction based on lifestyle dataset. Specifically, XGBoost, Gradient Boosting, and CatBoost models are used for boosting, Bagged Decision Tree, Random Forest, and Extra Tree models are used for bagging, and Logistic Regression, Decision Tree, and Support Vector Machine models are used for voting. Different preprocessing steps were employed to improve the quality assessment of the data. Hyperparameter tuning and feature selection and ranking are also used to achieve better prediction results. The considered models are extensively evaluated and compared using various metrics. Among all the models, XGBoost performed better with an accuracy of 98.10%, precision and recall of 97.50%, f1-score of 96.50%, and AUC-ROC of 100%, respectively. Additionally, weight, height, and age features were identified and ranked as the most significant predictors using the recursive feature elimination method for obesity risk prediction. Our proposed model can be used in the healthcare industry to support healthcare providers in better predicting and detecting multiple stages of obesity diseases.</div></div>\",\"PeriodicalId\":100357,\"journal\":{\"name\":\"Decision Analytics Journal\",\"volume\":\"14 \",\"pages\":\"Article 100539\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-12-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Decision Analytics Journal\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2772662224001437\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Decision Analytics Journal","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2772662224001437","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

肥胖症是一个重大的健康问题，影响着全世界数百万人。在早期阶段确定肥胖风险发生的潜在原因对医疗从业者来说是一个挑战。与肥胖相关的生活方式数据越来越多，因此必须采用有效的机器学习算法，从潜在的数据趋势中收集见解，并识别关键的患者状况。在这项研究中，采用了一种集成学习方法，包括提升、bagging和投票技术，用于基于生活方式数据集的肥胖风险预测。具体来说，XGBoost、Gradient Boosting和CatBoost模型用于助推，Bagged Decision Tree、Random Forest和Extra Tree模型用于套袋，Logistic Regression、Decision Tree和Support Vector Machine模型用于投票。采用不同的预处理步骤来提高数据的质量评估。为了获得更好的预测结果，还使用了超参数调优、特征选择和排序。所考虑的模型使用各种度量进行广泛的评估和比较。在所有模型中，XGBoost表现较好，准确率为98.10%，精密度和召回率为97.50%，f1得分为96.50%，AUC-ROC为100%。此外，使用递归特征消除法预测肥胖风险，确定体重、身高和年龄特征并将其列为最重要的预测因素。我们提出的模型可用于医疗保健行业，以支持医疗保健提供者更好地预测和检测肥胖症的多个阶段。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

An investigation of ensemble learning techniques for obesity risk prediction using lifestyle data

Obesity disease is a significant health issue and has affected millions of people worldwide. Identifying underlying reasons for the onset of obesity risk in its early stage has become challenging for medical practitioners. The growing volume of lifestyle data related to obesity has made it imperative to employ effective machine-learning algorithms that can gather insights from the underlying data trends and identify critical patient conditions. In this study, an ensemble learning approach including boosting, bagging, and voting techniques was used for obesity risk prediction based on lifestyle dataset. Specifically, XGBoost, Gradient Boosting, and CatBoost models are used for boosting, Bagged Decision Tree, Random Forest, and Extra Tree models are used for bagging, and Logistic Regression, Decision Tree, and Support Vector Machine models are used for voting. Different preprocessing steps were employed to improve the quality assessment of the data. Hyperparameter tuning and feature selection and ranking are also used to achieve better prediction results. The considered models are extensively evaluated and compared using various metrics. Among all the models, XGBoost performed better with an accuracy of 98.10%, precision and recall of 97.50%, f1-score of 96.50%, and AUC-ROC of 100%, respectively. Additionally, weight, height, and age features were identified and ranked as the most significant predictors using the recursive feature elimination method for obesity risk prediction. Our proposed model can be used in the healthcare industry to support healthcare providers in better predicting and detecting multiple stages of obesity diseases.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Decision Analytics Journal

CiteScore

3.90

自引率

0.00%

发文量