Radwan Qasrawi , Suliman Thwib , Ghada Issa , Razan Abu Ghoush , Malak Amro
{"title":"Type 2 diabetes risk prediction using glycemic control Metrics: A machine learning approach","authors":"Radwan Qasrawi , Suliman Thwib , Ghada Issa , Razan Abu Ghoush , Malak Amro","doi":"10.1016/j.hnm.2025.200341","DOIUrl":null,"url":null,"abstract":"<div><h3>Background</h3><div>Type 2 Diabetes Mellitus (T2DM) remains a significant global health burden, particularly in low- and middle-income settings. Conventional prevention strategies often lack personalization, overlooking individual variability in lifestyle, nutrition, and health status. This study aimed to develop a personalized T2DM risk prediction model using machine learning (ML), integrating clinical, behavioral, and dietary data, including glycemic index (GI) and glycemic load (GL) derived from actual food and recipe intake.</div></div><div><h3>Methods</h3><div>Data from 3145 Palestinian adults (aged 18–60) were analyzed using statistical and machine learning (ML) techniques. Variables included age, sex, education, income, physical activity, smoking status, perceived health, and detailed nutritional intake, specifically glycemic index (GI) and glycemic load (GL). Nine ML models were developed using the AutoGluon-Tabular framework. Model performance was assessed via accuracy, area under the curve (AUC), and log loss. Feature importance analysis identified key predictors of T2DM risk.</div></div><div><h3>Results</h3><div>Women had significantly higher odds of diabetes than men, while rural residents had a lower risk compared to urban dwellers. People aged 50–59 were over six times more likely to be diabetic than those aged 18–29. Lower education and poor perceived health were also strong predictors. Diabetic participants consumed significantly lower GI (87.7 ± 36.1) and GL (241 ± 180.5) diets compared to non-diabetics (GI = 98.8 ± 35.5; GL = 303.3 ± 202.7; p = 0.001). Among the ML models, XGBoost and CatBoost performed best, with over 93 % accuracy and excellent prediction scores. Glycemic load, age, BMI, waist-to-hip ratio, and self-reported health status were the most important risk indicators.</div></div><div><h3>Conclusion</h3><div>This study showed the effectiveness of integrating machine learning with glycemic control metrics and lifestyle data for personalized T2DM prediction. Incorporating glycemic values from real food and recipe intake improved model accuracy and interpretability. These findings support the development of precision prevention strategies tailored to individual risk profiles, particularly in underserved populations.</div></div>","PeriodicalId":36125,"journal":{"name":"Human Nutrition and Metabolism","volume":"42 ","pages":"Article 200341"},"PeriodicalIF":1.8000,"publicationDate":"2025-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Human Nutrition and Metabolism","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666149725000453","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"ENDOCRINOLOGY & METABOLISM","Score":null,"Total":0}
引用次数: 0
Abstract
Background
Type 2 Diabetes Mellitus (T2DM) remains a significant global health burden, particularly in low- and middle-income settings. Conventional prevention strategies often lack personalization, overlooking individual variability in lifestyle, nutrition, and health status. This study aimed to develop a personalized T2DM risk prediction model using machine learning (ML), integrating clinical, behavioral, and dietary data, including glycemic index (GI) and glycemic load (GL) derived from actual food and recipe intake.
Methods
Data from 3145 Palestinian adults (aged 18–60) were analyzed using statistical and machine learning (ML) techniques. Variables included age, sex, education, income, physical activity, smoking status, perceived health, and detailed nutritional intake, specifically glycemic index (GI) and glycemic load (GL). Nine ML models were developed using the AutoGluon-Tabular framework. Model performance was assessed via accuracy, area under the curve (AUC), and log loss. Feature importance analysis identified key predictors of T2DM risk.
Results
Women had significantly higher odds of diabetes than men, while rural residents had a lower risk compared to urban dwellers. People aged 50–59 were over six times more likely to be diabetic than those aged 18–29. Lower education and poor perceived health were also strong predictors. Diabetic participants consumed significantly lower GI (87.7 ± 36.1) and GL (241 ± 180.5) diets compared to non-diabetics (GI = 98.8 ± 35.5; GL = 303.3 ± 202.7; p = 0.001). Among the ML models, XGBoost and CatBoost performed best, with over 93 % accuracy and excellent prediction scores. Glycemic load, age, BMI, waist-to-hip ratio, and self-reported health status were the most important risk indicators.
Conclusion
This study showed the effectiveness of integrating machine learning with glycemic control metrics and lifestyle data for personalized T2DM prediction. Incorporating glycemic values from real food and recipe intake improved model accuracy and interpretability. These findings support the development of precision prevention strategies tailored to individual risk profiles, particularly in underserved populations.