{"title":"A Novel Machine Learning-Driven Voice and Clinical Biomarkers Framework for Robust Prediction of Type 2 Diabetes Mellitus.","authors":"Jingjing Guo, Weiqun Peng, Shuping Hu, Donghui Lu, Shuo Chen","doi":"10.1016/j.jvoice.2025.09.033","DOIUrl":null,"url":null,"abstract":"<p><strong>Objective: </strong>To develop and validate a multimodal, machine learning-based framework that integrates acoustic voice features with baseline clinical parameters for noninvasive and accurate screening of type 2 diabetes mellitus (T2DM).</p><p><strong>Materials and methods: </strong>We analyzed data from 3129 individuals, including 1158 with T2DM and 1971 without. Voice recordings were collected under standardized conditions and processed with the openSMILE toolkit to extract 88 acoustic features, encompassing prosodic, spectral, cepstral, and quality-related parameters. In parallel, 30 clinical features were obtained from demographic, anthropometric, biochemical, lifestyle, and medical history variables. After preprocessing and imputation, feature selection was conducted using LASSO, ANOVA, Mutual Information, and Recursive Feature Elimination. Dimensionality reduction with Principal Component Analysis was also evaluated. Models, including Logistic Regression, Random Forest, XGBoost, TabNet, and TabTransformer, were trained with cross-validation and tuned through grid and randomized searches. Performance was assessed on an independent test set using accuracy, recall, and area under the curve (AUC). Model interpretability was addressed via SHAP analysis, t-SNE visualization, and radar plots. Clinical utility was assessed with nomogram construction, calibration, and decision curve analysis (DCA).</p><p><strong>Results: </strong>Models using clinical features alone achieved moderate performance (AUC ≈ 69%). Acoustic-only models performed better, with the LASSO + XGBoost combination reaching an AUC of 80.8%. The fused feature set markedly outperformed both unimodal approaches, with the LASSO + XGBoost model achieving 94.1% accuracy, 93.6% recall, and an AUC of 95.2%. SHAP analysis identified HbA1c, fasting glucose, HOMA-IR, and acoustic markers such as jitter and shimmer as top predictors. Calibration plots showed excellent agreement between predicted and observed probabilities, while DCA demonstrated superior net clinical benefit.</p><p><strong>Conclusions: </strong>Our multimodal framework provides an accurate, interpretable, and clinically actionable approach for noninvasive T2DM screening. Future studies should validate these findings in diverse populations and explore integration into real-world digital health platforms.</p>","PeriodicalId":49954,"journal":{"name":"Journal of Voice","volume":" ","pages":""},"PeriodicalIF":2.4000,"publicationDate":"2025-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Voice","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1016/j.jvoice.2025.09.033","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUDIOLOGY & SPEECH-LANGUAGE PATHOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Objective: To develop and validate a multimodal, machine learning-based framework that integrates acoustic voice features with baseline clinical parameters for noninvasive and accurate screening of type 2 diabetes mellitus (T2DM).
Materials and methods: We analyzed data from 3129 individuals, including 1158 with T2DM and 1971 without. Voice recordings were collected under standardized conditions and processed with the openSMILE toolkit to extract 88 acoustic features, encompassing prosodic, spectral, cepstral, and quality-related parameters. In parallel, 30 clinical features were obtained from demographic, anthropometric, biochemical, lifestyle, and medical history variables. After preprocessing and imputation, feature selection was conducted using LASSO, ANOVA, Mutual Information, and Recursive Feature Elimination. Dimensionality reduction with Principal Component Analysis was also evaluated. Models, including Logistic Regression, Random Forest, XGBoost, TabNet, and TabTransformer, were trained with cross-validation and tuned through grid and randomized searches. Performance was assessed on an independent test set using accuracy, recall, and area under the curve (AUC). Model interpretability was addressed via SHAP analysis, t-SNE visualization, and radar plots. Clinical utility was assessed with nomogram construction, calibration, and decision curve analysis (DCA).
Results: Models using clinical features alone achieved moderate performance (AUC ≈ 69%). Acoustic-only models performed better, with the LASSO + XGBoost combination reaching an AUC of 80.8%. The fused feature set markedly outperformed both unimodal approaches, with the LASSO + XGBoost model achieving 94.1% accuracy, 93.6% recall, and an AUC of 95.2%. SHAP analysis identified HbA1c, fasting glucose, HOMA-IR, and acoustic markers such as jitter and shimmer as top predictors. Calibration plots showed excellent agreement between predicted and observed probabilities, while DCA demonstrated superior net clinical benefit.
Conclusions: Our multimodal framework provides an accurate, interpretable, and clinically actionable approach for noninvasive T2DM screening. Future studies should validate these findings in diverse populations and explore integration into real-world digital health platforms.
期刊介绍:
The Journal of Voice is widely regarded as the world''s premiere journal for voice medicine and research. This peer-reviewed publication is listed in Index Medicus and is indexed by the Institute for Scientific Information. The journal contains articles written by experts throughout the world on all topics in voice sciences, voice medicine and surgery, and speech-language pathologists'' management of voice-related problems. The journal includes clinical articles, clinical research, and laboratory research. Members of the Foundation receive the journal as a benefit of membership.