A Novel Machine Learning-Driven Voice and Clinical Biomarkers Framework for Robust Prediction of Type 2 Diabetes Mellitus.

IF 2.4 4区医学 Q1 AUDIOLOGY & SPEECH-LANGUAGE PATHOLOGY

Journal of Voice Pub Date : 2025-10-13 DOI:10.1016/j.jvoice.2025.09.033

Jingjing Guo, Weiqun Peng, Shuping Hu, Donghui Lu, Shuo Chen

{"title":"A Novel Machine Learning-Driven Voice and Clinical Biomarkers Framework for Robust Prediction of Type 2 Diabetes Mellitus.","authors":"Jingjing Guo, Weiqun Peng, Shuping Hu, Donghui Lu, Shuo Chen","doi":"10.1016/j.jvoice.2025.09.033","DOIUrl":null,"url":null,"abstract":"Objective: To develop and validate a multimodal, machine learning-based framework that integrates acoustic voice features with baseline clinical parameters for noninvasive and accurate screening of type 2 diabetes mellitus (T2DM).Materials and methods: We analyzed data from 3129 individuals, including 1158 with T2DM and 1971 without. Voice recordings were collected under standardized conditions and processed with the openSMILE toolkit to extract 88 acoustic features, encompassing prosodic, spectral, cepstral, and quality-related parameters. In parallel, 30 clinical features were obtained from demographic, anthropometric, biochemical, lifestyle, and medical history variables. After preprocessing and imputation, feature selection was conducted using LASSO, ANOVA, Mutual Information, and Recursive Feature Elimination. Dimensionality reduction with Principal Component Analysis was also evaluated. Models, including Logistic Regression, Random Forest, XGBoost, TabNet, and TabTransformer, were trained with cross-validation and tuned through grid and randomized searches. Performance was assessed on an independent test set using accuracy, recall, and area under the curve (AUC). Model interpretability was addressed via SHAP analysis, t-SNE visualization, and radar plots. Clinical utility was assessed with nomogram construction, calibration, and decision curve analysis (DCA).Results: Models using clinical features alone achieved moderate performance (AUC ≈ 69%). Acoustic-only models performed better, with the LASSO + XGBoost combination reaching an AUC of 80.8%. The fused feature set markedly outperformed both unimodal approaches, with the LASSO + XGBoost model achieving 94.1% accuracy, 93.6% recall, and an AUC of 95.2%. SHAP analysis identified HbA1c, fasting glucose, HOMA-IR, and acoustic markers such as jitter and shimmer as top predictors. Calibration plots showed excellent agreement between predicted and observed probabilities, while DCA demonstrated superior net clinical benefit.Conclusions: Our multimodal framework provides an accurate, interpretable, and clinically actionable approach for noninvasive T2DM screening. Future studies should validate these findings in diverse populations and explore integration into real-world digital health platforms.","PeriodicalId":49954,"journal":{"name":"Journal of Voice","volume":" ","pages":""},"PeriodicalIF":2.4000,"publicationDate":"2025-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Voice","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1016/j.jvoice.2025.09.033","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUDIOLOGY & SPEECH-LANGUAGE PATHOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Objective: To develop and validate a multimodal, machine learning-based framework that integrates acoustic voice features with baseline clinical parameters for noninvasive and accurate screening of type 2 diabetes mellitus (T2DM).

Materials and methods: We analyzed data from 3129 individuals, including 1158 with T2DM and 1971 without. Voice recordings were collected under standardized conditions and processed with the openSMILE toolkit to extract 88 acoustic features, encompassing prosodic, spectral, cepstral, and quality-related parameters. In parallel, 30 clinical features were obtained from demographic, anthropometric, biochemical, lifestyle, and medical history variables. After preprocessing and imputation, feature selection was conducted using LASSO, ANOVA, Mutual Information, and Recursive Feature Elimination. Dimensionality reduction with Principal Component Analysis was also evaluated. Models, including Logistic Regression, Random Forest, XGBoost, TabNet, and TabTransformer, were trained with cross-validation and tuned through grid and randomized searches. Performance was assessed on an independent test set using accuracy, recall, and area under the curve (AUC). Model interpretability was addressed via SHAP analysis, t-SNE visualization, and radar plots. Clinical utility was assessed with nomogram construction, calibration, and decision curve analysis (DCA).

Results: Models using clinical features alone achieved moderate performance (AUC ≈ 69%). Acoustic-only models performed better, with the LASSO + XGBoost combination reaching an AUC of 80.8%. The fused feature set markedly outperformed both unimodal approaches, with the LASSO + XGBoost model achieving 94.1% accuracy, 93.6% recall, and an AUC of 95.2%. SHAP analysis identified HbA1c, fasting glucose, HOMA-IR, and acoustic markers such as jitter and shimmer as top predictors. Calibration plots showed excellent agreement between predicted and observed probabilities, while DCA demonstrated superior net clinical benefit.

Conclusions: Our multimodal framework provides an accurate, interpretable, and clinically actionable approach for noninvasive T2DM screening. Future studies should validate these findings in diverse populations and explore integration into real-world digital health platforms.

查看原文本刊更多论文

一种新的机器学习驱动的语音和临床生物标志物框架，用于2型糖尿病的鲁棒预测。

目的：开发并验证一种基于机器学习的多模式框架，该框架将声学语音特征与基线临床参数相结合，用于无创和准确筛查2型糖尿病（T2DM）。材料和方法：我们分析了3129人的数据，其中1158人患有T2DM， 1971人没有。在标准化条件下收集录音，并使用openSMILE工具包进行处理，提取88个声学特征，包括韵律、频谱、倒谱和质量相关参数。同时，从人口统计学、人体测量学、生物化学、生活方式和病史变量中获得30个临床特征。经过预处理和插值后，利用LASSO、方差分析、互信息和递归特征消去等方法进行特征选择。主成分分析的降维效果也进行了评价。模型，包括Logistic回归、随机森林、XGBoost、TabNet和TabTransformer，通过交叉验证进行训练，并通过网格和随机搜索进行调整。在一个独立的测试集上，使用准确率、召回率和曲线下面积（AUC）来评估性能。通过SHAP分析、t-SNE可视化和雷达图解决了模型的可解释性。临床效用评估采用nomogram construction， calibration和decision curve analysis （DCA）。结果：仅使用临床特征的模型获得了中等的性能（AUC≈69%）。纯声学模型表现更好，LASSO + XGBoost组合的AUC达到80.8%。融合的特征集明显优于两种单峰方法，LASSO + XGBoost模型的准确率为94.1%，召回率为93.6%，AUC为95.2%。SHAP分析发现，HbA1c、空腹血糖、HOMA-IR和声音标记（如抖动和闪烁）是最重要的预测因素。校正图显示预测概率和观察概率非常一致，而DCA显示出优越的净临床效益。结论：我们的多模式框架为非侵入性T2DM筛查提供了一种准确、可解释和临床可操作的方法。未来的研究应在不同人群中验证这些发现，并探索与现实世界数字健康平台的整合。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Voice 医学-耳鼻喉科学

CiteScore

4.00

自引率

13.60%

发文量

395

审稿时长

59 days

期刊介绍： The Journal of Voice is widely regarded as the world''s premiere journal for voice medicine and research. This peer-reviewed publication is listed in Index Medicus and is indexed by the Institute for Scientific Information. The journal contains articles written by experts throughout the world on all topics in voice sciences, voice medicine and surgery, and speech-language pathologists'' management of voice-related problems. The journal includes clinical articles, clinical research, and laboratory research. Members of the Foundation receive the journal as a benefit of membership.