Development and Validation of DIANA (Diabetes Novel Subgroup Assessment tool): A web-based precision medicine tool to determine type 2 diabetes endotype membership and predict individuals at risk of microvascular disease.
Viswanathan Baskar, Mani Arun Vignesh, Sumanth C Raman, Arun Jijo, Bhavadharini Balaji, Nico Steckhan, Lena Maria Klara Roth, Moneeza K Siddiqui, Saravanan Jebarani, Ranjit Unnikrishnan, Viswanathan Mohan, Ranjit Mohan Anjana
{"title":"Development and Validation of DIANA (Diabetes Novel Subgroup Assessment tool): A web-based precision medicine tool to determine type 2 diabetes endotype membership and predict individuals at risk of microvascular disease.","authors":"Viswanathan Baskar, Mani Arun Vignesh, Sumanth C Raman, Arun Jijo, Bhavadharini Balaji, Nico Steckhan, Lena Maria Klara Roth, Moneeza K Siddiqui, Saravanan Jebarani, Ranjit Unnikrishnan, Viswanathan Mohan, Ranjit Mohan Anjana","doi":"10.1371/journal.pdig.0000702","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Previous research has identified four distinct endotypes of type 2 diabetes in Asian Indians, which include Severe Insulin Deficient Diabetes (SIDD), Combined Insulin Resistant and Deficient Diabetes (CIRDD), Insulin Resistance and Obese Diabetes (IROD), and Mild Age-related Diabetes (MARD). DIANA (Diabetes Novel Subgroup Assessment) is an online precision medicine tool that can predict endotype membership of type 2 diabetes and individual risk for retinopathy and nephropathy.</p><p><strong>Methodology: </strong>The DIANA tool determines subgroup membership using a machine learning model (support vector machine) on T2D subgroups in the Asian Indian population. We used a support vector machine (SVM) model to classify type 2 diabetes patient endotypes, and the model is trained based on k-fold cross-validation. Its performance was compared with an algorithm determined based on conditional pre-determined cut-offs and weights for each clinical feature [age at diagnosis, BMI, waist, HbA1c, Serum Triglycerides, HDL-Cholesterol, (C-peptide fasting, C-peptide stimulated) - optional. This study employed local interpretable model-agnostic explanations (LIME) and SHapley Additive exPlanations (SHAP) to demystify the endotype prediction model. A random forest model was built to assess an individual's risk for nephropathy and retinopathy based on individual risk algorithms.</p><p><strong>Findings: </strong>The SVM model has relatively high accuracy, specificity, sensitivity, and precision values compared to conditional pre-determined cut-offs 98% vs 63.6%, 99.8% vs 88%, 98.5% vs 65.1%, and 98.7% vs 63.4%. Clinician face value validation of the prediction by the SVM model reported an accuracy, specificity, sensitivity and precision compared to conditional pre-determined cut-offs 97% vs 85%, 95.3% vs 63%, 95.8% vs 73%, and 98.9% vs 66.9%. Additionally, our study demonstrated the impact of features on ML models through LIME and SHAP analyses. The accuracy of the random forest risk prediction model for nephropathy and retinopathy was 89.6% (p < 0.05) and 78.4% (p < 0.05), respectively.</p><p><strong>Conclusion: </strong>We conclude that, DIANA is an accurate, clinically explainable AI tool that clinicians can use to make informed decisions on risk assessment and provide precision management to individuals with new-onset type 2 diabetes.</p>","PeriodicalId":74465,"journal":{"name":"PLOS digital health","volume":"4 8","pages":"e0000702"},"PeriodicalIF":7.7000,"publicationDate":"2025-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12324136/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"PLOS digital health","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1371/journal.pdig.0000702","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/8/1 0:00:00","PubModel":"eCollection","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Background: Previous research has identified four distinct endotypes of type 2 diabetes in Asian Indians, which include Severe Insulin Deficient Diabetes (SIDD), Combined Insulin Resistant and Deficient Diabetes (CIRDD), Insulin Resistance and Obese Diabetes (IROD), and Mild Age-related Diabetes (MARD). DIANA (Diabetes Novel Subgroup Assessment) is an online precision medicine tool that can predict endotype membership of type 2 diabetes and individual risk for retinopathy and nephropathy.
Methodology: The DIANA tool determines subgroup membership using a machine learning model (support vector machine) on T2D subgroups in the Asian Indian population. We used a support vector machine (SVM) model to classify type 2 diabetes patient endotypes, and the model is trained based on k-fold cross-validation. Its performance was compared with an algorithm determined based on conditional pre-determined cut-offs and weights for each clinical feature [age at diagnosis, BMI, waist, HbA1c, Serum Triglycerides, HDL-Cholesterol, (C-peptide fasting, C-peptide stimulated) - optional. This study employed local interpretable model-agnostic explanations (LIME) and SHapley Additive exPlanations (SHAP) to demystify the endotype prediction model. A random forest model was built to assess an individual's risk for nephropathy and retinopathy based on individual risk algorithms.
Findings: The SVM model has relatively high accuracy, specificity, sensitivity, and precision values compared to conditional pre-determined cut-offs 98% vs 63.6%, 99.8% vs 88%, 98.5% vs 65.1%, and 98.7% vs 63.4%. Clinician face value validation of the prediction by the SVM model reported an accuracy, specificity, sensitivity and precision compared to conditional pre-determined cut-offs 97% vs 85%, 95.3% vs 63%, 95.8% vs 73%, and 98.9% vs 66.9%. Additionally, our study demonstrated the impact of features on ML models through LIME and SHAP analyses. The accuracy of the random forest risk prediction model for nephropathy and retinopathy was 89.6% (p < 0.05) and 78.4% (p < 0.05), respectively.
Conclusion: We conclude that, DIANA is an accurate, clinically explainable AI tool that clinicians can use to make informed decisions on risk assessment and provide precision management to individuals with new-onset type 2 diabetes.