A Lightweight CNN for Multiclass Retinal Disease Screening with Explainable AI.

IF 2.7 Q3 IMAGING SCIENCE & PHOTOGRAPHIC TECHNOLOGY

Journal of Imaging Pub Date : 2025-08-15 DOI:10.3390/jimaging11080275

Arjun Kumar Bose Arnob, Muhammad Hasibur Rashid Chayon, Fahmid Al Farid, Mohd Nizam Husen, Firoz Ahmed

{"title":"A Lightweight CNN for Multiclass Retinal Disease Screening with Explainable AI.","authors":"Arjun Kumar Bose Arnob, Muhammad Hasibur Rashid Chayon, Fahmid Al Farid, Mohd Nizam Husen, Firoz Ahmed","doi":"10.3390/jimaging11080275","DOIUrl":null,"url":null,"abstract":"<p><p>Timely, balanced, and transparent detection of retinal diseases is essential to avert irreversible vision loss; however, current deep learning screeners are hampered by class imbalance, large models, and opaque reasoning. This paper presents a lightweight attention-augmented convolutional neural network (CNN) that addresses all three barriers. The network combines depthwise separable convolutions, squeeze-and-excitation, and global-context attention, and it incorporates gradient-based class activation mapping (Grad-CAM) and Grad-CAM++ to ensure that every decision is accompanied by pixel-level evidence. A 5335-image ten-class color-fundus dataset from Bangladeshi clinics, which was severely skewed (17-1509 images per class), was equalized using a synthetic minority oversampling technique (SMOTE) and task-specific augmentations. Images were resized to 150×150 px and split 70:15:15. The training used the adaptive moment estimation (Adam) optimizer (initial learning rate of 1×10-4, reduce-on-plateau, early stopping), ℓ2 regularization, and dual dropout. The 16.6 M parameter network converged in fewer than 50 epochs on a mid-range graphics processing unit (GPU) and reached 87.9% test accuracy, a macro-precision of 0.882, a macro-recall of 0.879, and a macro-F1-score of 0.880, reducing the error by 58% relative to the best ImageNet backbone (Inception-V3, 40.4% accuracy). Eight disorders recorded true-positive rates above 95%; macular scar and central serous chorioretinopathy attained F1-scores of 0.77 and 0.89, respectively. Saliency maps consistently highlighted optic disc margins, subretinal fluid, and other hallmarks. Targeted class re-balancing, lightweight attention, and integrated explainability, therefore, deliver accurate, transparent, and deployable retinal screening suitable for point-of-care ophthalmic triage on resource-limited hardware.</p>","PeriodicalId":37035,"journal":{"name":"Journal of Imaging","volume":"11 8","pages":""},"PeriodicalIF":2.7000,"publicationDate":"2025-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12387214/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Imaging","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3390/jimaging11080275","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"IMAGING SCIENCE & PHOTOGRAPHIC TECHNOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Timely, balanced, and transparent detection of retinal diseases is essential to avert irreversible vision loss; however, current deep learning screeners are hampered by class imbalance, large models, and opaque reasoning. This paper presents a lightweight attention-augmented convolutional neural network (CNN) that addresses all three barriers. The network combines depthwise separable convolutions, squeeze-and-excitation, and global-context attention, and it incorporates gradient-based class activation mapping (Grad-CAM) and Grad-CAM++ to ensure that every decision is accompanied by pixel-level evidence. A 5335-image ten-class color-fundus dataset from Bangladeshi clinics, which was severely skewed (17-1509 images per class), was equalized using a synthetic minority oversampling technique (SMOTE) and task-specific augmentations. Images were resized to 150×150 px and split 70:15:15. The training used the adaptive moment estimation (Adam) optimizer (initial learning rate of 1×10-4, reduce-on-plateau, early stopping), ℓ2 regularization, and dual dropout. The 16.6 M parameter network converged in fewer than 50 epochs on a mid-range graphics processing unit (GPU) and reached 87.9% test accuracy, a macro-precision of 0.882, a macro-recall of 0.879, and a macro-F1-score of 0.880, reducing the error by 58% relative to the best ImageNet backbone (Inception-V3, 40.4% accuracy). Eight disorders recorded true-positive rates above 95%; macular scar and central serous chorioretinopathy attained F1-scores of 0.77 and 0.89, respectively. Saliency maps consistently highlighted optic disc margins, subretinal fluid, and other hallmarks. Targeted class re-balancing, lightweight attention, and integrated explainability, therefore, deliver accurate, transparent, and deployable retinal screening suitable for point-of-care ophthalmic triage on resource-limited hardware.

Abstract Image

查看原文本刊更多论文

基于可解释人工智能的轻型CNN多类别视网膜疾病筛查

及时、平衡和透明地检测视网膜疾病对于避免不可逆转的视力丧失至关重要；然而，目前的深度学习筛选器受到班级不平衡、大型模型和不透明推理的阻碍。本文提出了一种轻量级的注意力增强卷积神经网络（CNN），解决了这三个障碍。该网络结合了深度可分离卷积、压缩激励和全局上下文关注，并结合了基于梯度的类激活映射（Grad-CAM）和Grad-CAM++，以确保每个决策都伴随着像素级证据。来自孟加拉国诊所的5335张图像10类彩色眼底数据集严重倾斜（每类17-1509张图像），使用合成少数过采样技术（SMOTE）和特定任务增强技术进行了均衡。图像被调整为150×150 px，分割70:15:15。训练采用自适应矩估计（Adam）优化器（初始学习率1×10-4，平台上减少，早期停止），l2正则化和双重退出。16.6 M参数的网络在中程图形处理单元（GPU）上不到50个epoch的时间内收敛，测试准确率达到87.9%，宏精度为0.882，宏召回率为0.879，宏f1得分为0.880，相对于最佳ImageNet骨干网（Inception-V3，准确率40.4%），误差降低了58%。8种疾病的真阳性率超过95%；黄斑瘢痕和中心性浆液性脉络膜视网膜病变的f1评分分别为0.77和0.89。显著性图始终突出显示视盘边缘、视网膜下积液和其他特征。因此，有针对性的类别重新平衡、轻量级关注和集成的可解释性提供了准确、透明和可部署的视网膜筛查，适用于资源有限的硬件上的即时眼科分诊。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊