Bishamber Nath , Srilekha Anumulapuri , Amir Ali , Rupam Das , Priyank Bhola , Manabjyoti Barman , Srinivasa Rao Mutheneni , Ramu Adela
{"title":"使用机器学习模型识别东北印度人群糖尿病视网膜病变的危险因素和分类","authors":"Bishamber Nath , Srilekha Anumulapuri , Amir Ali , Rupam Das , Priyank Bhola , Manabjyoti Barman , Srinivasa Rao Mutheneni , Ramu Adela","doi":"10.1016/j.cegh.2025.102170","DOIUrl":null,"url":null,"abstract":"<div><h3>Problem considered</h3><div>Diabetic retinopathy (DR) is the most common cause of blindness among working-age population. With escalating global diabetes prevalence, identifying risk factors is crucial for prioritizing DR diagnosis. This study aimed to determine key risk factors in the Northeast Indian population and classify DR using artificial intelligence models.</div></div><div><h3>Methods</h3><div>In this study, twenty-seven clinical and biochemical characteristics of 188 individuals across four groups, healthy control (HC), type-2 diabetes mellitus (T2DM), non-proliferative DR (NPDR), and proliferative DR (PDR) were analysed to identify the DR risk factors. Data were analysed using four ML models, and Shapley Additive Explanation (SHAP) analysis was applied to the best-performing random forest (RF) model to assess the clinical relevance of features. Additionally, convolutional neural network (CNN) and graph neural network (GNN) models were employed for DR classification using fundus images.</div></div><div><h3>Results</h3><div>Among four classifiers, the RF model achieved 100 % training accuracy and 92 % test accuracy. In the testing dataset, the RF model achieved an area under the ROC curve (AUC) of 1 for HC and PDR. While it achieved an AUC of 0.96 and 0.97 for T2DM and NPDR, respectively. SHAP analysis identified uric acid levels, T2DM duration, glycosylated haemoglobin (HbA1c), fasting blood sugar (FBS), and tobacco/betelnut chewing as significant predictors of DR. The GNN model outperformed CNN in fundus image classification, achieving 82 % test accuracy and an AUC of 0.85.</div></div><div><h3>Conclusion</h3><div>The RF model effectively identified DR risk factors in the Northeast Indian population, while GNN demonstrated robust classification accuracy. Integrating ML and DL enhances early DR risk assessment and diagnosis, improving disease management and patient outcomes.</div></div>","PeriodicalId":46404,"journal":{"name":"Clinical Epidemiology and Global Health","volume":"36 ","pages":"Article 102170"},"PeriodicalIF":1.7000,"publicationDate":"2025-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Risk factor identification and classification of diabetic retinopathy among Northeast Indian population using machine learning models\",\"authors\":\"Bishamber Nath , Srilekha Anumulapuri , Amir Ali , Rupam Das , Priyank Bhola , Manabjyoti Barman , Srinivasa Rao Mutheneni , Ramu Adela\",\"doi\":\"10.1016/j.cegh.2025.102170\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><h3>Problem considered</h3><div>Diabetic retinopathy (DR) is the most common cause of blindness among working-age population. With escalating global diabetes prevalence, identifying risk factors is crucial for prioritizing DR diagnosis. This study aimed to determine key risk factors in the Northeast Indian population and classify DR using artificial intelligence models.</div></div><div><h3>Methods</h3><div>In this study, twenty-seven clinical and biochemical characteristics of 188 individuals across four groups, healthy control (HC), type-2 diabetes mellitus (T2DM), non-proliferative DR (NPDR), and proliferative DR (PDR) were analysed to identify the DR risk factors. Data were analysed using four ML models, and Shapley Additive Explanation (SHAP) analysis was applied to the best-performing random forest (RF) model to assess the clinical relevance of features. Additionally, convolutional neural network (CNN) and graph neural network (GNN) models were employed for DR classification using fundus images.</div></div><div><h3>Results</h3><div>Among four classifiers, the RF model achieved 100 % training accuracy and 92 % test accuracy. In the testing dataset, the RF model achieved an area under the ROC curve (AUC) of 1 for HC and PDR. While it achieved an AUC of 0.96 and 0.97 for T2DM and NPDR, respectively. SHAP analysis identified uric acid levels, T2DM duration, glycosylated haemoglobin (HbA1c), fasting blood sugar (FBS), and tobacco/betelnut chewing as significant predictors of DR. The GNN model outperformed CNN in fundus image classification, achieving 82 % test accuracy and an AUC of 0.85.</div></div><div><h3>Conclusion</h3><div>The RF model effectively identified DR risk factors in the Northeast Indian population, while GNN demonstrated robust classification accuracy. Integrating ML and DL enhances early DR risk assessment and diagnosis, improving disease management and patient outcomes.</div></div>\",\"PeriodicalId\":46404,\"journal\":{\"name\":\"Clinical Epidemiology and Global Health\",\"volume\":\"36 \",\"pages\":\"Article 102170\"},\"PeriodicalIF\":1.7000,\"publicationDate\":\"2025-09-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Clinical Epidemiology and Global Health\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S221339842500260X\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"PUBLIC, ENVIRONMENTAL & OCCUPATIONAL HEALTH\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Clinical Epidemiology and Global Health","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S221339842500260X","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"PUBLIC, ENVIRONMENTAL & OCCUPATIONAL HEALTH","Score":null,"Total":0}
Risk factor identification and classification of diabetic retinopathy among Northeast Indian population using machine learning models
Problem considered
Diabetic retinopathy (DR) is the most common cause of blindness among working-age population. With escalating global diabetes prevalence, identifying risk factors is crucial for prioritizing DR diagnosis. This study aimed to determine key risk factors in the Northeast Indian population and classify DR using artificial intelligence models.
Methods
In this study, twenty-seven clinical and biochemical characteristics of 188 individuals across four groups, healthy control (HC), type-2 diabetes mellitus (T2DM), non-proliferative DR (NPDR), and proliferative DR (PDR) were analysed to identify the DR risk factors. Data were analysed using four ML models, and Shapley Additive Explanation (SHAP) analysis was applied to the best-performing random forest (RF) model to assess the clinical relevance of features. Additionally, convolutional neural network (CNN) and graph neural network (GNN) models were employed for DR classification using fundus images.
Results
Among four classifiers, the RF model achieved 100 % training accuracy and 92 % test accuracy. In the testing dataset, the RF model achieved an area under the ROC curve (AUC) of 1 for HC and PDR. While it achieved an AUC of 0.96 and 0.97 for T2DM and NPDR, respectively. SHAP analysis identified uric acid levels, T2DM duration, glycosylated haemoglobin (HbA1c), fasting blood sugar (FBS), and tobacco/betelnut chewing as significant predictors of DR. The GNN model outperformed CNN in fundus image classification, achieving 82 % test accuracy and an AUC of 0.85.
Conclusion
The RF model effectively identified DR risk factors in the Northeast Indian population, while GNN demonstrated robust classification accuracy. Integrating ML and DL enhances early DR risk assessment and diagnosis, improving disease management and patient outcomes.
期刊介绍:
Clinical Epidemiology and Global Health (CEGH) is a multidisciplinary journal and it is published four times (March, June, September, December) a year. The mandate of CEGH is to promote articles on clinical epidemiology with focus on developing countries in the context of global health. We also accept articles from other countries. It publishes original research work across all disciplines of medicine and allied sciences, related to clinical epidemiology and global health. The journal publishes Original articles, Review articles, Evidence Summaries, Letters to the Editor. All articles published in CEGH are peer-reviewed and published online for immediate access and citation.