基于Grad-CAM的Xception网多眼疾病检测解释

IF 4.2 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Image and Vision Computing Pub Date : 2025-02-01 DOI:10.1016/j.imavis.2025.105419

M. Raveenthini , R. Lavanya , Raul Benitez

{"title":"基于Grad-CAM的Xception网多眼疾病检测解释","authors":"M. Raveenthini , R. Lavanya , Raul Benitez","doi":"10.1016/j.imavis.2025.105419","DOIUrl":null,"url":null,"abstract":"<div><div>Age-related macular degeneration (AMD), cataract, diabetic retinopathy (DR) and glaucoma are the four most common ocular conditions that lead to vision loss. Early detection in asymptomatic stages is necessary to alleviate vision loss. Manual diagnosis is costly, tedious, laborious and burdensome; assistive tools such as computer aided diagnosis (CAD) systems can help to alleviate these issues. Existing CAD systems for ocular diseases primarily address a single disease condition, employing disease-specific algorithms that rely on anatomical and morphological characteristics for localization of regions of interest (ROIs). The dependence on exhaustive image processing algorithms for pre-processing, ROI detection and feature extraction often results in overly complex systems prone to errors that affect classifier performance. Conglomerating many such individual diagnostic frameworks, each targeting a single disease, is not a practical solution for detecting multiple ocular diseases, especially in mass screening. Alternatively, a single generic CAD framework modeled as a multiclass problem serves to be useful in such high throughput scenarios, significantly reducing cost, time and manpower. Nevertheless, ambiguities in the overlapping features of multiple classes representing different diseases should be effectively addressed. This paper proposes a segmentation-independent approach based on deep learning (DL) to realize a single framework for the detection of different ocular conditions. The proposed work alleviates the need for pixel-level operations and segmentation techniques specific to different ocular diseases, offering a solution that has an upper hand compared to conventional systems in terms of complexity and accuracy. Further, explainability is incorporated as a value-addition that assures trust and confidence in the model. The system involves automatic feature extraction from full fundus images using Xception, a pre-trained deep model. Xception utilizes depthwise separable convolutions to capture subtle patterns in fundus images, effectively addressing the similarities between clinical indicators, such as drusen in AMD and exudates in DR, which often lead to misdiagnosis. A random over-sampling technique is performed to address class imbalance by equalizing the number of training samples across the classes. These features are fed to extreme gradient boosting (XGB) for classification. This study further aims to unveil the “black box” paradigm of model classification, by leveraging gradient-weighted class activation mapping (Grad-CAM) technique to highlight relevant ROIs. The combination of Xception based feature extraction and XGB classification results in 99.31% accuracy, 99.5% sensitivity, 99.8% specificity, 99.4% F1-score and 99.4% precision. The proposed system can be a promising tool aiding conventional manual screening in primary health care centres and mass screening scenarios for efficiently diagnosing multiple ocular diseases, enhancing personalized and remote eye care, particularly in resource-limited settings. By combining objective performance metrics such as accuracy, sensitivity, and specificity with subjective Grad-CAM visualizations, the system offers a comprehensive evaluation framework, ensuring transparency and building trust in ocular healthcare, making it well-suited for clinical adoption.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"154 ","pages":"Article 105419"},"PeriodicalIF":4.2000,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Grad-CAM based explanations for multiocular disease detection using Xception net\",\"authors\":\"M. Raveenthini , R. Lavanya , Raul Benitez\",\"doi\":\"10.1016/j.imavis.2025.105419\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Age-related macular degeneration (AMD), cataract, diabetic retinopathy (DR) and glaucoma are the four most common ocular conditions that lead to vision loss. Early detection in asymptomatic stages is necessary to alleviate vision loss. Manual diagnosis is costly, tedious, laborious and burdensome; assistive tools such as computer aided diagnosis (CAD) systems can help to alleviate these issues. Existing CAD systems for ocular diseases primarily address a single disease condition, employing disease-specific algorithms that rely on anatomical and morphological characteristics for localization of regions of interest (ROIs). The dependence on exhaustive image processing algorithms for pre-processing, ROI detection and feature extraction often results in overly complex systems prone to errors that affect classifier performance. Conglomerating many such individual diagnostic frameworks, each targeting a single disease, is not a practical solution for detecting multiple ocular diseases, especially in mass screening. Alternatively, a single generic CAD framework modeled as a multiclass problem serves to be useful in such high throughput scenarios, significantly reducing cost, time and manpower. Nevertheless, ambiguities in the overlapping features of multiple classes representing different diseases should be effectively addressed. This paper proposes a segmentation-independent approach based on deep learning (DL) to realize a single framework for the detection of different ocular conditions. The proposed work alleviates the need for pixel-level operations and segmentation techniques specific to different ocular diseases, offering a solution that has an upper hand compared to conventional systems in terms of complexity and accuracy. Further, explainability is incorporated as a value-addition that assures trust and confidence in the model. The system involves automatic feature extraction from full fundus images using Xception, a pre-trained deep model. Xception utilizes depthwise separable convolutions to capture subtle patterns in fundus images, effectively addressing the similarities between clinical indicators, such as drusen in AMD and exudates in DR, which often lead to misdiagnosis. A random over-sampling technique is performed to address class imbalance by equalizing the number of training samples across the classes. These features are fed to extreme gradient boosting (XGB) for classification. This study further aims to unveil the “black box” paradigm of model classification, by leveraging gradient-weighted class activation mapping (Grad-CAM) technique to highlight relevant ROIs. The combination of Xception based feature extraction and XGB classification results in 99.31% accuracy, 99.5% sensitivity, 99.8% specificity, 99.4% F1-score and 99.4% precision. The proposed system can be a promising tool aiding conventional manual screening in primary health care centres and mass screening scenarios for efficiently diagnosing multiple ocular diseases, enhancing personalized and remote eye care, particularly in resource-limited settings. By combining objective performance metrics such as accuracy, sensitivity, and specificity with subjective Grad-CAM visualizations, the system offers a comprehensive evaluation framework, ensuring transparency and building trust in ocular healthcare, making it well-suited for clinical adoption.</div></div>\",\"PeriodicalId\":50374,\"journal\":{\"name\":\"Image and Vision Computing\",\"volume\":\"154 \",\"pages\":\"Article 105419\"},\"PeriodicalIF\":4.2000,\"publicationDate\":\"2025-02-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Image and Vision Computing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0262885625000071\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Image and Vision Computing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0262885625000071","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

老年性黄斑变性（AMD）、白内障、糖尿病性视网膜病变（DR）和青光眼是导致视力丧失的四种最常见的眼部疾病。在无症状阶段早期发现是必要的，以减轻视力丧失。人工诊断费用高、繁琐、费力、负担重；计算机辅助诊断（CAD）系统等辅助工具可以帮助缓解这些问题。现有的眼部疾病CAD系统主要针对单一疾病状况，采用依赖于感兴趣区域（roi）定位的解剖和形态特征的疾病特异性算法。依赖穷举图像处理算法进行预处理、ROI检测和特征提取往往会导致过于复杂的系统，容易出现影响分类器性能的错误。将许多这样的单个诊断框架集中在一起，每个诊断框架针对一种疾病，这不是检测多种眼部疾病的实际解决方案，特别是在大规模筛查中。另外，一个单一的通用CAD框架作为一个多类问题建模，在这种高吞吐量的情况下是有用的，大大减少了成本、时间和人力。然而，代表不同疾病的多个类别的重叠特征中的模糊性应得到有效解决。本文提出了一种基于深度学习（DL）的分割无关方法，以实现检测不同眼部状况的单一框架。提出的工作减轻了对特定于不同眼病的像素级操作和分割技术的需求，提供了与传统系统相比在复杂性和准确性方面具有优势的解决方案。此外，可解释性作为一种附加价值被纳入，以确保对模型的信任和信心。该系统包括使用Xception（一种预训练的深度模型）从全眼底图像中自动提取特征。Xception利用深度可分离卷积来捕捉眼底图像中的细微模式，有效地解决了临床指标之间的相似性，例如AMD的结节和DR的渗出物，这经常导致误诊。采用随机过采样技术，通过均衡类间的训练样本数量来解决类不平衡问题。这些特征被输入到极限梯度增强（XGB）中进行分类。本研究旨在进一步揭示模型分类的“黑箱”范式，利用梯度加权类激活映射（Grad-CAM）技术来突出相关的roi。基于异常的特征提取与XGB分类相结合，准确率为99.31%，灵敏度为99.5%，特异性为99.8%，f1评分为99.4%，精密度为99.4%。该系统可以作为一种有前景的工具，帮助初级卫生保健中心的传统手工筛查和大规模筛查场景，有效诊断多种眼部疾病，加强个性化和远程眼科护理，特别是在资源有限的环境中。通过将客观性能指标（如准确性、灵敏度和特异性）与主观的Grad-CAM可视化相结合，该系统提供了一个全面的评估框架，确保了透明度并建立了对眼科保健的信任，使其非常适合临床采用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Grad-CAM based explanations for multiocular disease detection using Xception net

Age-related macular degeneration (AMD), cataract, diabetic retinopathy (DR) and glaucoma are the four most common ocular conditions that lead to vision loss. Early detection in asymptomatic stages is necessary to alleviate vision loss. Manual diagnosis is costly, tedious, laborious and burdensome; assistive tools such as computer aided diagnosis (CAD) systems can help to alleviate these issues. Existing CAD systems for ocular diseases primarily address a single disease condition, employing disease-specific algorithms that rely on anatomical and morphological characteristics for localization of regions of interest (ROIs). The dependence on exhaustive image processing algorithms for pre-processing, ROI detection and feature extraction often results in overly complex systems prone to errors that affect classifier performance. Conglomerating many such individual diagnostic frameworks, each targeting a single disease, is not a practical solution for detecting multiple ocular diseases, especially in mass screening. Alternatively, a single generic CAD framework modeled as a multiclass problem serves to be useful in such high throughput scenarios, significantly reducing cost, time and manpower. Nevertheless, ambiguities in the overlapping features of multiple classes representing different diseases should be effectively addressed. This paper proposes a segmentation-independent approach based on deep learning (DL) to realize a single framework for the detection of different ocular conditions. The proposed work alleviates the need for pixel-level operations and segmentation techniques specific to different ocular diseases, offering a solution that has an upper hand compared to conventional systems in terms of complexity and accuracy. Further, explainability is incorporated as a value-addition that assures trust and confidence in the model. The system involves automatic feature extraction from full fundus images using Xception, a pre-trained deep model. Xception utilizes depthwise separable convolutions to capture subtle patterns in fundus images, effectively addressing the similarities between clinical indicators, such as drusen in AMD and exudates in DR, which often lead to misdiagnosis. A random over-sampling technique is performed to address class imbalance by equalizing the number of training samples across the classes. These features are fed to extreme gradient boosting (XGB) for classification. This study further aims to unveil the “black box” paradigm of model classification, by leveraging gradient-weighted class activation mapping (Grad-CAM) technique to highlight relevant ROIs. The combination of Xception based feature extraction and XGB classification results in 99.31% accuracy, 99.5% sensitivity, 99.8% specificity, 99.4% F1-score and 99.4% precision. The proposed system can be a promising tool aiding conventional manual screening in primary health care centres and mass screening scenarios for efficiently diagnosing multiple ocular diseases, enhancing personalized and remote eye care, particularly in resource-limited settings. By combining objective performance metrics such as accuracy, sensitivity, and specificity with subjective Grad-CAM visualizations, the system offers a comprehensive evaluation framework, ensuring transparency and building trust in ocular healthcare, making it well-suited for clinical adoption.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Image and Vision Computing 工程技术-工程：电子与电气

CiteScore

8.50

自引率

8.50%

发文量

143

审稿时长

7.8 months

期刊介绍： Image and Vision Computing has as a primary aim the provision of an effective medium of interchange for the results of high quality theoretical and applied research fundamental to all aspects of image interpretation and computer vision. The journal publishes work that proposes new image interpretation and computer vision methodology or addresses the application of such methods to real world scenes. It seeks to strengthen a deeper understanding in the discipline by encouraging the quantitative comparison and performance evaluation of the proposed methodology. The coverage includes: image interpretation, scene modelling, object recognition and tracking, shape analysis, monitoring and surveillance, active vision and robotic systems, SLAM, biologically-inspired computer vision, motion analysis, stereo vision, document image understanding, character and handwritten text recognition, face and gesture recognition, biometrics, vision-based human-computer interaction, human activity and behavior understanding, data fusion from multiple sensor inputs, image databases.