Dhiaa Musleh, Ali Alkhwaja, Ibrahim Alkhwaja, Mohammed Alghamdi, Hussam Abahussain, Mohammed Albugami, Faisal Alfawaz, Said El-Ashker, M. Al-Hariri
{"title":"预测大学生心血管代谢疾病风险的机器学习方法","authors":"Dhiaa Musleh, Ali Alkhwaja, Ibrahim Alkhwaja, Mohammed Alghamdi, Hussam Abahussain, Mohammed Albugami, Faisal Alfawaz, Said El-Ashker, M. Al-Hariri","doi":"10.3390/bdcc8030031","DOIUrl":null,"url":null,"abstract":"Obesity is increasingly becoming a prevalent health concern among adolescents, leading to significant risks like cardiometabolic diseases (CMDs). The early discovery and diagnosis of CMD is essential for better outcomes. This study aims to build a reliable artificial intelligence model that can predict CMD using various machine learning techniques. Support vector machines (SVMs), K-Nearest neighbor (KNN), Logistic Regression (LR), Random Forest (RF), and Gradient Boosting are five robust classifiers that are compared in this study. A novel “risk level” feature, derived through fuzzy logic applied to the Conicity Index, as a novel feature, which was previously unused, is introduced to enhance the interpretability and discriminatory properties of the proposed models. As the Conicity Index scores indicate CMD risk, two separate models are developed to address each gender individually. The performance of the proposed models is assessed using two datasets obtained from 295 records of undergraduate students in Saudi Arabia. The dataset comprises 121 male and 174 female students with diverse risk levels. Notably, Logistic Regression emerges as the top performer among males, achieving an accuracy score of 91%, while Gradient Boosting lags with a score of 72%. Among females, both Support Vector Machine and Logistic Regression lead with an accuracy score of 87%, while Random Forest performs least optimally with a score of 80%.","PeriodicalId":505155,"journal":{"name":"Big Data and Cognitive Computing","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Machine Learning Approaches for Predicting Risk of Cardiometabolic Disease among University Students\",\"authors\":\"Dhiaa Musleh, Ali Alkhwaja, Ibrahim Alkhwaja, Mohammed Alghamdi, Hussam Abahussain, Mohammed Albugami, Faisal Alfawaz, Said El-Ashker, M. Al-Hariri\",\"doi\":\"10.3390/bdcc8030031\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Obesity is increasingly becoming a prevalent health concern among adolescents, leading to significant risks like cardiometabolic diseases (CMDs). The early discovery and diagnosis of CMD is essential for better outcomes. This study aims to build a reliable artificial intelligence model that can predict CMD using various machine learning techniques. Support vector machines (SVMs), K-Nearest neighbor (KNN), Logistic Regression (LR), Random Forest (RF), and Gradient Boosting are five robust classifiers that are compared in this study. A novel “risk level” feature, derived through fuzzy logic applied to the Conicity Index, as a novel feature, which was previously unused, is introduced to enhance the interpretability and discriminatory properties of the proposed models. As the Conicity Index scores indicate CMD risk, two separate models are developed to address each gender individually. The performance of the proposed models is assessed using two datasets obtained from 295 records of undergraduate students in Saudi Arabia. The dataset comprises 121 male and 174 female students with diverse risk levels. Notably, Logistic Regression emerges as the top performer among males, achieving an accuracy score of 91%, while Gradient Boosting lags with a score of 72%. Among females, both Support Vector Machine and Logistic Regression lead with an accuracy score of 87%, while Random Forest performs least optimally with a score of 80%.\",\"PeriodicalId\":505155,\"journal\":{\"name\":\"Big Data and Cognitive Computing\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-03-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Big Data and Cognitive Computing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.3390/bdcc8030031\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Big Data and Cognitive Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3390/bdcc8030031","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Machine Learning Approaches for Predicting Risk of Cardiometabolic Disease among University Students
Obesity is increasingly becoming a prevalent health concern among adolescents, leading to significant risks like cardiometabolic diseases (CMDs). The early discovery and diagnosis of CMD is essential for better outcomes. This study aims to build a reliable artificial intelligence model that can predict CMD using various machine learning techniques. Support vector machines (SVMs), K-Nearest neighbor (KNN), Logistic Regression (LR), Random Forest (RF), and Gradient Boosting are five robust classifiers that are compared in this study. A novel “risk level” feature, derived through fuzzy logic applied to the Conicity Index, as a novel feature, which was previously unused, is introduced to enhance the interpretability and discriminatory properties of the proposed models. As the Conicity Index scores indicate CMD risk, two separate models are developed to address each gender individually. The performance of the proposed models is assessed using two datasets obtained from 295 records of undergraduate students in Saudi Arabia. The dataset comprises 121 male and 174 female students with diverse risk levels. Notably, Logistic Regression emerges as the top performer among males, achieving an accuracy score of 91%, while Gradient Boosting lags with a score of 72%. Among females, both Support Vector Machine and Logistic Regression lead with an accuracy score of 87%, while Random Forest performs least optimally with a score of 80%.