Dual Machine Learning Framework for Predicting Long-Term Glycemic Change and Prediabetes Risk in Young Taiwanese Men.

IF 3.3 3区 医学 Q1 MEDICINE, GENERAL & INTERNAL
Chung-Chi Yang, Sheng-Tang Wu, Ta-Wei Chu, Chi-Hao Liu, Yung-Jen Chuang
{"title":"Dual Machine Learning Framework for Predicting Long-Term Glycemic Change and Prediabetes Risk in Young Taiwanese Men.","authors":"Chung-Chi Yang, Sheng-Tang Wu, Ta-Wei Chu, Chi-Hao Liu, Yung-Jen Chuang","doi":"10.3390/diagnostics15192507","DOIUrl":null,"url":null,"abstract":"<p><p><b>Background:</b> Early detection of dysglycemia in young adults is important but underexplored. This study aimed to (1) predict long-term changes in fasting plasma glucose (δ-FPG) and (2) classify future prediabetes using complementary machine learning (ML) approaches. <b>Methods:</b> We analyzed 6247 Taiwanese men aged 18-35 years (mean follow-up 5.9 years). For δ-FPG (continuous outcome), random forest, stochastic gradient boosting (SGB), eXtreme gradient boosting (XGBoost), and elastic net were compared with multiple linear regression using Symmetric mean absolute percentage error (SMAPE), Root mean squared error (RMSE), Relative absolute error(RAE), and Root relative squared error (RRSE) Sensitivity analyses excluded baseline FPG (FPG<sub>base</sub>). Shapley additive explanations(SHAP) values provided interpretability, and stability was assessed across 10 repeated train-test cycles with confidence intervals. For prediabetes (binary outcome), an XGBoost classifier was trained on top predictors, with class imbalance corrected by SMOTE-Tomek. Calibration and decision-curve analysis (DCA) were also performed. <b>Results:</b> ML models consistently outperformed regression on all error metrics. FPG<sub>base</sub> was the dominant predictor in full models (100% importance). Without FPG<sub>base</sub>, key predictors included body fat, white blood cell count, age, thyroid-stimulating hormone, triglycerides, and low-density lipoprotein cholesterol. The prediabetes classifier achieved accuracy 0.788, precision 0.791, sensitivity 0.995, ROC-AUC 0.667, and PR-AUC 0.873. At a high-sensitivity threshold (0.2892), sensitivity reached 99.53% (specificity 47.46%); at a balanced threshold (0.5683), sensitivity was 88.69% and specificity was 90.61%. Calibration was acceptable (Brier 0.1754), and DCA indicated clinical utility. <b>Conclusions:</b> FPG<sub>base</sub> is the strongest predictor of glycemic change, but adiposity, inflammation, thyroid status, and lipids remain informative. A dual interpretable ML framework offers clinically actionable tools for screening and risk stratification in young men.</p>","PeriodicalId":11225,"journal":{"name":"Diagnostics","volume":"15 19","pages":""},"PeriodicalIF":3.3000,"publicationDate":"2025-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12524205/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Diagnostics","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.3390/diagnostics15192507","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MEDICINE, GENERAL & INTERNAL","Score":null,"Total":0}
引用次数: 0

Abstract

Background: Early detection of dysglycemia in young adults is important but underexplored. This study aimed to (1) predict long-term changes in fasting plasma glucose (δ-FPG) and (2) classify future prediabetes using complementary machine learning (ML) approaches. Methods: We analyzed 6247 Taiwanese men aged 18-35 years (mean follow-up 5.9 years). For δ-FPG (continuous outcome), random forest, stochastic gradient boosting (SGB), eXtreme gradient boosting (XGBoost), and elastic net were compared with multiple linear regression using Symmetric mean absolute percentage error (SMAPE), Root mean squared error (RMSE), Relative absolute error(RAE), and Root relative squared error (RRSE) Sensitivity analyses excluded baseline FPG (FPGbase). Shapley additive explanations(SHAP) values provided interpretability, and stability was assessed across 10 repeated train-test cycles with confidence intervals. For prediabetes (binary outcome), an XGBoost classifier was trained on top predictors, with class imbalance corrected by SMOTE-Tomek. Calibration and decision-curve analysis (DCA) were also performed. Results: ML models consistently outperformed regression on all error metrics. FPGbase was the dominant predictor in full models (100% importance). Without FPGbase, key predictors included body fat, white blood cell count, age, thyroid-stimulating hormone, triglycerides, and low-density lipoprotein cholesterol. The prediabetes classifier achieved accuracy 0.788, precision 0.791, sensitivity 0.995, ROC-AUC 0.667, and PR-AUC 0.873. At a high-sensitivity threshold (0.2892), sensitivity reached 99.53% (specificity 47.46%); at a balanced threshold (0.5683), sensitivity was 88.69% and specificity was 90.61%. Calibration was acceptable (Brier 0.1754), and DCA indicated clinical utility. Conclusions: FPGbase is the strongest predictor of glycemic change, but adiposity, inflammation, thyroid status, and lipids remain informative. A dual interpretable ML framework offers clinically actionable tools for screening and risk stratification in young men.

Abstract Image

Abstract Image

Abstract Image

预测台湾年轻男性长期血糖变化及前驱糖尿病风险的双机器学习框架。
背景:年轻成人早期检测血糖异常很重要,但尚未得到充分的研究。本研究旨在(1)预测空腹血糖(δ-FPG)的长期变化,(2)使用互补的机器学习(ML)方法对未来的前驱糖尿病进行分类。方法:我们分析了6247名年龄在18-35岁的台湾男性(平均随访5.9年)。对于δ-FPG(连续结果),使用对称平均绝对百分比误差(SMAPE)、均方根误差(RMSE)、相对绝对误差(RAE)和根相对平方误差(RRSE)对随机森林、随机梯度增强(SGB)、极端梯度增强(XGBoost)和弹性网进行多元线性回归比较,敏感度分析排除基线FPG (FPGbase)。Shapley加性解释(SHAP)值提供了可解释性,并且稳定性在10个重复训练测试周期中进行了评估,并带有置信区间。对于前驱糖尿病(二元结果),XGBoost分类器在顶级预测器上进行训练,并通过SMOTE-Tomek校正类不平衡。并进行了标定和决策曲线分析(DCA)。结果:ML模型在所有误差度量上始终优于回归。在完整模型中,fpga是主要的预测因子(100%重要)。如果没有FPGbase,关键的预测因子包括体脂、白细胞计数、年龄、促甲状腺激素、甘油三酯和低密度脂蛋白胆固醇。准确度0.788,精密度0.791,灵敏度0.995,ROC-AUC 0.667, PR-AUC 0.873。在高灵敏度阈值(0.2892)下,灵敏度达99.53%(特异度47.46%);在平衡阈值(0.5683)下,敏感性为88.69%,特异性为90.61%。校正是可接受的(Brier 0.1754), DCA具有临床应用价值。结论:FPGbase是血糖变化的最强预测因子,但肥胖、炎症、甲状腺状态和脂质仍然是信息。双重解释的ML框架为年轻男性的筛查和风险分层提供了临床可操作的工具。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Diagnostics
Diagnostics Biochemistry, Genetics and Molecular Biology-Clinical Biochemistry
CiteScore
4.70
自引率
8.30%
发文量
2699
审稿时长
19.64 days
期刊介绍: Diagnostics (ISSN 2075-4418) is an international scholarly open access journal on medical diagnostics. It publishes original research articles, reviews, communications and short notes on the research and development of medical diagnostics. There is no restriction on the length of the papers. Our aim is to encourage scientists to publish their experimental and theoretical research in as much detail as possible. Full experimental and/or methodological details must be provided for research articles.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信