使用机器学习模型识别东北印度人群糖尿病视网膜病变的危险因素和分类

IF 1.7 Q2 PUBLIC, ENVIRONMENTAL & OCCUPATIONAL HEALTH

Clinical Epidemiology and Global Health Pub Date : 2025-09-09 DOI:10.1016/j.cegh.2025.102170

Bishamber Nath , Srilekha Anumulapuri , Amir Ali , Rupam Das , Priyank Bhola , Manabjyoti Barman , Srinivasa Rao Mutheneni , Ramu Adela

{"title":"使用机器学习模型识别东北印度人群糖尿病视网膜病变的危险因素和分类","authors":"Bishamber Nath , Srilekha Anumulapuri , Amir Ali , Rupam Das , Priyank Bhola , Manabjyoti Barman , Srinivasa Rao Mutheneni , Ramu Adela","doi":"10.1016/j.cegh.2025.102170","DOIUrl":null,"url":null,"abstract":"<div><h3>Problem considered</h3><div>Diabetic retinopathy (DR) is the most common cause of blindness among working-age population. With escalating global diabetes prevalence, identifying risk factors is crucial for prioritizing DR diagnosis. This study aimed to determine key risk factors in the Northeast Indian population and classify DR using artificial intelligence models.</div></div><div><h3>Methods</h3><div>In this study, twenty-seven clinical and biochemical characteristics of 188 individuals across four groups, healthy control (HC), type-2 diabetes mellitus (T2DM), non-proliferative DR (NPDR), and proliferative DR (PDR) were analysed to identify the DR risk factors. Data were analysed using four ML models, and Shapley Additive Explanation (SHAP) analysis was applied to the best-performing random forest (RF) model to assess the clinical relevance of features. Additionally, convolutional neural network (CNN) and graph neural network (GNN) models were employed for DR classification using fundus images.</div></div><div><h3>Results</h3><div>Among four classifiers, the RF model achieved 100 % training accuracy and 92 % test accuracy. In the testing dataset, the RF model achieved an area under the ROC curve (AUC) of 1 for HC and PDR. While it achieved an AUC of 0.96 and 0.97 for T2DM and NPDR, respectively. SHAP analysis identified uric acid levels, T2DM duration, glycosylated haemoglobin (HbA1c), fasting blood sugar (FBS), and tobacco/betelnut chewing as significant predictors of DR. The GNN model outperformed CNN in fundus image classification, achieving 82 % test accuracy and an AUC of 0.85.</div></div><div><h3>Conclusion</h3><div>The RF model effectively identified DR risk factors in the Northeast Indian population, while GNN demonstrated robust classification accuracy. Integrating ML and DL enhances early DR risk assessment and diagnosis, improving disease management and patient outcomes.</div></div>","PeriodicalId":46404,"journal":{"name":"Clinical Epidemiology and Global Health","volume":"36 ","pages":"Article 102170"},"PeriodicalIF":1.7000,"publicationDate":"2025-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Risk factor identification and classification of diabetic retinopathy among Northeast Indian population using machine learning models\",\"authors\":\"Bishamber Nath , Srilekha Anumulapuri , Amir Ali , Rupam Das , Priyank Bhola , Manabjyoti Barman , Srinivasa Rao Mutheneni , Ramu Adela\",\"doi\":\"10.1016/j.cegh.2025.102170\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><h3>Problem considered</h3><div>Diabetic retinopathy (DR) is the most common cause of blindness among working-age population. With escalating global diabetes prevalence, identifying risk factors is crucial for prioritizing DR diagnosis. This study aimed to determine key risk factors in the Northeast Indian population and classify DR using artificial intelligence models.</div></div><div><h3>Methods</h3><div>In this study, twenty-seven clinical and biochemical characteristics of 188 individuals across four groups, healthy control (HC), type-2 diabetes mellitus (T2DM), non-proliferative DR (NPDR), and proliferative DR (PDR) were analysed to identify the DR risk factors. Data were analysed using four ML models, and Shapley Additive Explanation (SHAP) analysis was applied to the best-performing random forest (RF) model to assess the clinical relevance of features. Additionally, convolutional neural network (CNN) and graph neural network (GNN) models were employed for DR classification using fundus images.</div></div><div><h3>Results</h3><div>Among four classifiers, the RF model achieved 100 % training accuracy and 92 % test accuracy. In the testing dataset, the RF model achieved an area under the ROC curve (AUC) of 1 for HC and PDR. While it achieved an AUC of 0.96 and 0.97 for T2DM and NPDR, respectively. SHAP analysis identified uric acid levels, T2DM duration, glycosylated haemoglobin (HbA1c), fasting blood sugar (FBS), and tobacco/betelnut chewing as significant predictors of DR. The GNN model outperformed CNN in fundus image classification, achieving 82 % test accuracy and an AUC of 0.85.</div></div><div><h3>Conclusion</h3><div>The RF model effectively identified DR risk factors in the Northeast Indian population, while GNN demonstrated robust classification accuracy. Integrating ML and DL enhances early DR risk assessment and diagnosis, improving disease management and patient outcomes.</div></div>\",\"PeriodicalId\":46404,\"journal\":{\"name\":\"Clinical Epidemiology and Global Health\",\"volume\":\"36 \",\"pages\":\"Article 102170\"},\"PeriodicalIF\":1.7000,\"publicationDate\":\"2025-09-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Clinical Epidemiology and Global Health\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S221339842500260X\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"PUBLIC, ENVIRONMENTAL & OCCUPATIONAL HEALTH\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Clinical Epidemiology and Global Health","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S221339842500260X","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"PUBLIC, ENVIRONMENTAL & OCCUPATIONAL HEALTH","Score":null,"Total":0}

引用次数: 0

摘要

糖尿病视网膜病变（DR）是导致劳动年龄人群失明的最常见原因。随着全球糖尿病患病率的不断上升，确定危险因素对于DR的优先诊断至关重要。本研究旨在确定印度东北部人群的关键危险因素，并使用人工智能模型对DR进行分类。方法分析健康对照组（HC）、2型糖尿病（T2DM）、非增殖性DR （NPDR）和增殖性DR (PDR) 4组188例患者的27项临床生化特征，探讨DR的危险因素。使用四种ML模型对数据进行分析，并将Shapley加性解释（SHAP）分析应用于表现最佳的随机森林（RF）模型，以评估特征的临床相关性。此外，采用卷积神经网络（CNN）和图神经网络（GNN）模型对眼底图像进行DR分类。结果在4个分类器中，射频模型的训练准确率达到100%，测试准确率达到92%。在测试数据集中，RF模型对HC和PDR的ROC曲线下面积（AUC）为1。T2DM和NPDR的AUC分别为0.96和0.97。SHAP分析发现，尿酸水平、T2DM持续时间、糖化血红蛋白（HbA1c）、空腹血糖（FBS）和咀嚼烟草/槟榔是dr的重要预测因素。GNN模型在眼底图像分类方面优于CNN，测试准确率达到82%，AUC为0.85。结论RF模型有效识别了印度东北部人群的DR危险因素，而GNN模型具有较强的分类准确性。整合ML和DL可增强早期DR风险评估和诊断，改善疾病管理和患者预后。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Risk factor identification and classification of diabetic retinopathy among Northeast Indian population using machine learning models

Problem considered

Diabetic retinopathy (DR) is the most common cause of blindness among working-age population. With escalating global diabetes prevalence, identifying risk factors is crucial for prioritizing DR diagnosis. This study aimed to determine key risk factors in the Northeast Indian population and classify DR using artificial intelligence models.

Methods

In this study, twenty-seven clinical and biochemical characteristics of 188 individuals across four groups, healthy control (HC), type-2 diabetes mellitus (T2DM), non-proliferative DR (NPDR), and proliferative DR (PDR) were analysed to identify the DR risk factors. Data were analysed using four ML models, and Shapley Additive Explanation (SHAP) analysis was applied to the best-performing random forest (RF) model to assess the clinical relevance of features. Additionally, convolutional neural network (CNN) and graph neural network (GNN) models were employed for DR classification using fundus images.

Results

Among four classifiers, the RF model achieved 100 % training accuracy and 92 % test accuracy. In the testing dataset, the RF model achieved an area under the ROC curve (AUC) of 1 for HC and PDR. While it achieved an AUC of 0.96 and 0.97 for T2DM and NPDR, respectively. SHAP analysis identified uric acid levels, T2DM duration, glycosylated haemoglobin (HbA1c), fasting blood sugar (FBS), and tobacco/betelnut chewing as significant predictors of DR. The GNN model outperformed CNN in fundus image classification, achieving 82 % test accuracy and an AUC of 0.85.

Conclusion

The RF model effectively identified DR risk factors in the Northeast Indian population, while GNN demonstrated robust classification accuracy. Integrating ML and DL enhances early DR risk assessment and diagnosis, improving disease management and patient outcomes.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Clinical Epidemiology and Global Health PUBLIC, ENVIRONMENTAL & OCCUPATIONAL HEALTH-

CiteScore

4.60

自引率

7.70%

发文量

218

审稿时长

66 days

期刊介绍： Clinical Epidemiology and Global Health (CEGH) is a multidisciplinary journal and it is published four times (March, June, September, December) a year. The mandate of CEGH is to promote articles on clinical epidemiology with focus on developing countries in the context of global health. We also accept articles from other countries. It publishes original research work across all disciplines of medicine and allied sciences, related to clinical epidemiology and global health. The journal publishes Original articles, Review articles, Evidence Summaries, Letters to the Editor. All articles published in CEGH are peer-reviewed and published online for immediate access and citation.