Development and Validation of DIANA (Diabetes Novel Subgroup Assessment tool): A web-based precision medicine tool to determine type 2 diabetes endotype membership and predict individuals at risk of microvascular disease.

IF 7.7

PLOS digital health Pub Date : 2025-08-05 eCollection Date: 2025-08-01 DOI:10.1371/journal.pdig.0000702

Viswanathan Baskar, Mani Arun Vignesh, Sumanth C Raman, Arun Jijo, Bhavadharini Balaji, Nico Steckhan, Lena Maria Klara Roth, Moneeza K Siddiqui, Saravanan Jebarani, Ranjit Unnikrishnan, Viswanathan Mohan, Ranjit Mohan Anjana

{"title":"Development and Validation of DIANA (Diabetes Novel Subgroup Assessment tool): A web-based precision medicine tool to determine type 2 diabetes endotype membership and predict individuals at risk of microvascular disease.","authors":"Viswanathan Baskar, Mani Arun Vignesh, Sumanth C Raman, Arun Jijo, Bhavadharini Balaji, Nico Steckhan, Lena Maria Klara Roth, Moneeza K Siddiqui, Saravanan Jebarani, Ranjit Unnikrishnan, Viswanathan Mohan, Ranjit Mohan Anjana","doi":"10.1371/journal.pdig.0000702","DOIUrl":null,"url":null,"abstract":"Background: Previous research has identified four distinct endotypes of type 2 diabetes in Asian Indians, which include Severe Insulin Deficient Diabetes (SIDD), Combined Insulin Resistant and Deficient Diabetes (CIRDD), Insulin Resistance and Obese Diabetes (IROD), and Mild Age-related Diabetes (MARD). DIANA (Diabetes Novel Subgroup Assessment) is an online precision medicine tool that can predict endotype membership of type 2 diabetes and individual risk for retinopathy and nephropathy.Methodology: The DIANA tool determines subgroup membership using a machine learning model (support vector machine) on T2D subgroups in the Asian Indian population. We used a support vector machine (SVM) model to classify type 2 diabetes patient endotypes, and the model is trained based on k-fold cross-validation. Its performance was compared with an algorithm determined based on conditional pre-determined cut-offs and weights for each clinical feature [age at diagnosis, BMI, waist, HbA1c, Serum Triglycerides, HDL-Cholesterol, (C-peptide fasting, C-peptide stimulated) - optional. This study employed local interpretable model-agnostic explanations (LIME) and SHapley Additive exPlanations (SHAP) to demystify the endotype prediction model. A random forest model was built to assess an individual's risk for nephropathy and retinopathy based on individual risk algorithms.Findings: The SVM model has relatively high accuracy, specificity, sensitivity, and precision values compared to conditional pre-determined cut-offs 98% vs 63.6%, 99.8% vs 88%, 98.5% vs 65.1%, and 98.7% vs 63.4%. Clinician face value validation of the prediction by the SVM model reported an accuracy, specificity, sensitivity and precision compared to conditional pre-determined cut-offs 97% vs 85%, 95.3% vs 63%, 95.8% vs 73%, and 98.9% vs 66.9%. Additionally, our study demonstrated the impact of features on ML models through LIME and SHAP analyses. The accuracy of the random forest risk prediction model for nephropathy and retinopathy was 89.6% (p < 0.05) and 78.4% (p < 0.05), respectively.Conclusion: We conclude that, DIANA is an accurate, clinically explainable AI tool that clinicians can use to make informed decisions on risk assessment and provide precision management to individuals with new-onset type 2 diabetes.","PeriodicalId":74465,"journal":{"name":"PLOS digital health","volume":"4 8","pages":"e0000702"},"PeriodicalIF":7.7000,"publicationDate":"2025-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12324136/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"PLOS digital health","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1371/journal.pdig.0000702","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/8/1 0:00:00","PubModel":"eCollection","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Background: Previous research has identified four distinct endotypes of type 2 diabetes in Asian Indians, which include Severe Insulin Deficient Diabetes (SIDD), Combined Insulin Resistant and Deficient Diabetes (CIRDD), Insulin Resistance and Obese Diabetes (IROD), and Mild Age-related Diabetes (MARD). DIANA (Diabetes Novel Subgroup Assessment) is an online precision medicine tool that can predict endotype membership of type 2 diabetes and individual risk for retinopathy and nephropathy.

Methodology: The DIANA tool determines subgroup membership using a machine learning model (support vector machine) on T2D subgroups in the Asian Indian population. We used a support vector machine (SVM) model to classify type 2 diabetes patient endotypes, and the model is trained based on k-fold cross-validation. Its performance was compared with an algorithm determined based on conditional pre-determined cut-offs and weights for each clinical feature [age at diagnosis, BMI, waist, HbA1c, Serum Triglycerides, HDL-Cholesterol, (C-peptide fasting, C-peptide stimulated) - optional. This study employed local interpretable model-agnostic explanations (LIME) and SHapley Additive exPlanations (SHAP) to demystify the endotype prediction model. A random forest model was built to assess an individual's risk for nephropathy and retinopathy based on individual risk algorithms.

Findings: The SVM model has relatively high accuracy, specificity, sensitivity, and precision values compared to conditional pre-determined cut-offs 98% vs 63.6%, 99.8% vs 88%, 98.5% vs 65.1%, and 98.7% vs 63.4%. Clinician face value validation of the prediction by the SVM model reported an accuracy, specificity, sensitivity and precision compared to conditional pre-determined cut-offs 97% vs 85%, 95.3% vs 63%, 95.8% vs 73%, and 98.9% vs 66.9%. Additionally, our study demonstrated the impact of features on ML models through LIME and SHAP analyses. The accuracy of the random forest risk prediction model for nephropathy and retinopathy was 89.6% (p < 0.05) and 78.4% (p < 0.05), respectively.

Conclusion: We conclude that, DIANA is an accurate, clinically explainable AI tool that clinicians can use to make informed decisions on risk assessment and provide precision management to individuals with new-onset type 2 diabetes.

Abstract Image

查看原文本刊更多论文

DIANA（糖尿病新亚组评估工具）的开发和验证：一种基于网络的精确医学工具，用于确定2型糖尿病内型成员并预测微血管疾病风险个体。

背景：先前的研究已经确定了亚洲印度人的四种不同的2型糖尿病内型，包括严重胰岛素缺乏型糖尿病（SIDD），合并胰岛素抵抗和缺乏型糖尿病（CIRDD），胰岛素抵抗和肥胖型糖尿病（IROD）和轻度年龄相关性糖尿病（MARD）。DIANA（糖尿病新亚组评估）是一个在线精准医学工具，可以预测2型糖尿病的内型成员以及视网膜病变和肾病的个体风险。方法：DIANA工具使用机器学习模型（支持向量机）确定亚洲印度人口中T2D子组的子组成员。我们使用支持向量机（SVM）模型对2型糖尿病患者的内源性类型进行分类，并基于k-fold交叉验证对模型进行训练。将其性能与基于每个临床特征(诊断年龄、BMI、腰围、HbA1c、血清甘油三酯、高密度脂蛋白胆固醇、（c肽禁食、c肽刺激）（可选）的条件预先确定的截止值和权重确定的算法进行比较。本研究采用局部可解释模型不可知论解释（LIME）和SHapley加性解释（SHAP）来揭开内型预测模型的神秘面纱。建立随机森林模型，基于个体风险算法评估个体患肾病和视网膜病变的风险。结果：与条件预先确定的截止值相比，SVM模型具有相对较高的准确性、特异性、灵敏度和精密度值，分别为98%对63.6%、99.8%对88%、98.5%对65.1%和98.7%对63.4%。与条件预先确定的截止值相比，临床医生对SVM模型预测的表面值验证报告的准确性，特异性，敏感性和精密度分别为97%对85%，95.3%对63%，95.8%对73%，98.9%对66.9%。此外，我们的研究通过LIME和SHAP分析证明了特征对ML模型的影响。结论：DIANA是一种准确的、临床可解释的人工智能工具，临床医生可以使用它来做出明智的风险评估决策，并为新发2型糖尿病患者提供精确的管理。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

PLOS digital health

自引率

0.00%

发文量