{"title":"Dual Machine Learning Framework for Predicting Long-Term Glycemic Change and Prediabetes Risk in Young Taiwanese Men.","authors":"Chung-Chi Yang, Sheng-Tang Wu, Ta-Wei Chu, Chi-Hao Liu, Yung-Jen Chuang","doi":"10.3390/diagnostics15192507","DOIUrl":null,"url":null,"abstract":"<p><p><b>Background:</b> Early detection of dysglycemia in young adults is important but underexplored. This study aimed to (1) predict long-term changes in fasting plasma glucose (δ-FPG) and (2) classify future prediabetes using complementary machine learning (ML) approaches. <b>Methods:</b> We analyzed 6247 Taiwanese men aged 18-35 years (mean follow-up 5.9 years). For δ-FPG (continuous outcome), random forest, stochastic gradient boosting (SGB), eXtreme gradient boosting (XGBoost), and elastic net were compared with multiple linear regression using Symmetric mean absolute percentage error (SMAPE), Root mean squared error (RMSE), Relative absolute error(RAE), and Root relative squared error (RRSE) Sensitivity analyses excluded baseline FPG (FPG<sub>base</sub>). Shapley additive explanations(SHAP) values provided interpretability, and stability was assessed across 10 repeated train-test cycles with confidence intervals. For prediabetes (binary outcome), an XGBoost classifier was trained on top predictors, with class imbalance corrected by SMOTE-Tomek. Calibration and decision-curve analysis (DCA) were also performed. <b>Results:</b> ML models consistently outperformed regression on all error metrics. FPG<sub>base</sub> was the dominant predictor in full models (100% importance). Without FPG<sub>base</sub>, key predictors included body fat, white blood cell count, age, thyroid-stimulating hormone, triglycerides, and low-density lipoprotein cholesterol. The prediabetes classifier achieved accuracy 0.788, precision 0.791, sensitivity 0.995, ROC-AUC 0.667, and PR-AUC 0.873. At a high-sensitivity threshold (0.2892), sensitivity reached 99.53% (specificity 47.46%); at a balanced threshold (0.5683), sensitivity was 88.69% and specificity was 90.61%. Calibration was acceptable (Brier 0.1754), and DCA indicated clinical utility. <b>Conclusions:</b> FPG<sub>base</sub> is the strongest predictor of glycemic change, but adiposity, inflammation, thyroid status, and lipids remain informative. A dual interpretable ML framework offers clinically actionable tools for screening and risk stratification in young men.</p>","PeriodicalId":11225,"journal":{"name":"Diagnostics","volume":"15 19","pages":""},"PeriodicalIF":3.3000,"publicationDate":"2025-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12524205/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Diagnostics","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.3390/diagnostics15192507","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MEDICINE, GENERAL & INTERNAL","Score":null,"Total":0}
引用次数: 0
Abstract
Background: Early detection of dysglycemia in young adults is important but underexplored. This study aimed to (1) predict long-term changes in fasting plasma glucose (δ-FPG) and (2) classify future prediabetes using complementary machine learning (ML) approaches. Methods: We analyzed 6247 Taiwanese men aged 18-35 years (mean follow-up 5.9 years). For δ-FPG (continuous outcome), random forest, stochastic gradient boosting (SGB), eXtreme gradient boosting (XGBoost), and elastic net were compared with multiple linear regression using Symmetric mean absolute percentage error (SMAPE), Root mean squared error (RMSE), Relative absolute error(RAE), and Root relative squared error (RRSE) Sensitivity analyses excluded baseline FPG (FPGbase). Shapley additive explanations(SHAP) values provided interpretability, and stability was assessed across 10 repeated train-test cycles with confidence intervals. For prediabetes (binary outcome), an XGBoost classifier was trained on top predictors, with class imbalance corrected by SMOTE-Tomek. Calibration and decision-curve analysis (DCA) were also performed. Results: ML models consistently outperformed regression on all error metrics. FPGbase was the dominant predictor in full models (100% importance). Without FPGbase, key predictors included body fat, white blood cell count, age, thyroid-stimulating hormone, triglycerides, and low-density lipoprotein cholesterol. The prediabetes classifier achieved accuracy 0.788, precision 0.791, sensitivity 0.995, ROC-AUC 0.667, and PR-AUC 0.873. At a high-sensitivity threshold (0.2892), sensitivity reached 99.53% (specificity 47.46%); at a balanced threshold (0.5683), sensitivity was 88.69% and specificity was 90.61%. Calibration was acceptable (Brier 0.1754), and DCA indicated clinical utility. Conclusions: FPGbase is the strongest predictor of glycemic change, but adiposity, inflammation, thyroid status, and lipids remain informative. A dual interpretable ML framework offers clinically actionable tools for screening and risk stratification in young men.
DiagnosticsBiochemistry, Genetics and Molecular Biology-Clinical Biochemistry
CiteScore
4.70
自引率
8.30%
发文量
2699
审稿时长
19.64 days
期刊介绍:
Diagnostics (ISSN 2075-4418) is an international scholarly open access journal on medical diagnostics. It publishes original research articles, reviews, communications and short notes on the research and development of medical diagnostics. There is no restriction on the length of the papers. Our aim is to encourage scientists to publish their experimental and theoretical research in as much detail as possible. Full experimental and/or methodological details must be provided for research articles.