{"title":"基于Grad-CAM的Xception网多眼疾病检测解释","authors":"M. Raveenthini , R. Lavanya , Raul Benitez","doi":"10.1016/j.imavis.2025.105419","DOIUrl":null,"url":null,"abstract":"<div><div>Age-related macular degeneration (AMD), cataract, diabetic retinopathy (DR) and glaucoma are the four most common ocular conditions that lead to vision loss. Early detection in asymptomatic stages is necessary to alleviate vision loss. Manual diagnosis is costly, tedious, laborious and burdensome; assistive tools such as computer aided diagnosis (CAD) systems can help to alleviate these issues. Existing CAD systems for ocular diseases primarily address a single disease condition, employing disease-specific algorithms that rely on anatomical and morphological characteristics for localization of regions of interest (ROIs). The dependence on exhaustive image processing algorithms for pre-processing, ROI detection and feature extraction often results in overly complex systems prone to errors that affect classifier performance. Conglomerating many such individual diagnostic frameworks, each targeting a single disease, is not a practical solution for detecting multiple ocular diseases, especially in mass screening. Alternatively, a single generic CAD framework modeled as a multiclass problem serves to be useful in such high throughput scenarios, significantly reducing cost, time and manpower. Nevertheless, ambiguities in the overlapping features of multiple classes representing different diseases should be effectively addressed. This paper proposes a segmentation-independent approach based on deep learning (DL) to realize a single framework for the detection of different ocular conditions. The proposed work alleviates the need for pixel-level operations and segmentation techniques specific to different ocular diseases, offering a solution that has an upper hand compared to conventional systems in terms of complexity and accuracy. Further, explainability is incorporated as a value-addition that assures trust and confidence in the model. The system involves automatic feature extraction from full fundus images using Xception, a pre-trained deep model. Xception utilizes depthwise separable convolutions to capture subtle patterns in fundus images, effectively addressing the similarities between clinical indicators, such as drusen in AMD and exudates in DR, which often lead to misdiagnosis. A random over-sampling technique is performed to address class imbalance by equalizing the number of training samples across the classes. These features are fed to extreme gradient boosting (XGB) for classification. This study further aims to unveil the “black box” paradigm of model classification, by leveraging gradient-weighted class activation mapping (Grad-CAM) technique to highlight relevant ROIs. The combination of Xception based feature extraction and XGB classification results in 99.31% accuracy, 99.5% sensitivity, 99.8% specificity, 99.4% F1-score and 99.4% precision. The proposed system can be a promising tool aiding conventional manual screening in primary health care centres and mass screening scenarios for efficiently diagnosing multiple ocular diseases, enhancing personalized and remote eye care, particularly in resource-limited settings. By combining objective performance metrics such as accuracy, sensitivity, and specificity with subjective Grad-CAM visualizations, the system offers a comprehensive evaluation framework, ensuring transparency and building trust in ocular healthcare, making it well-suited for clinical adoption.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"154 ","pages":"Article 105419"},"PeriodicalIF":4.2000,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Grad-CAM based explanations for multiocular disease detection using Xception net\",\"authors\":\"M. Raveenthini , R. Lavanya , Raul Benitez\",\"doi\":\"10.1016/j.imavis.2025.105419\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Age-related macular degeneration (AMD), cataract, diabetic retinopathy (DR) and glaucoma are the four most common ocular conditions that lead to vision loss. Early detection in asymptomatic stages is necessary to alleviate vision loss. Manual diagnosis is costly, tedious, laborious and burdensome; assistive tools such as computer aided diagnosis (CAD) systems can help to alleviate these issues. Existing CAD systems for ocular diseases primarily address a single disease condition, employing disease-specific algorithms that rely on anatomical and morphological characteristics for localization of regions of interest (ROIs). The dependence on exhaustive image processing algorithms for pre-processing, ROI detection and feature extraction often results in overly complex systems prone to errors that affect classifier performance. Conglomerating many such individual diagnostic frameworks, each targeting a single disease, is not a practical solution for detecting multiple ocular diseases, especially in mass screening. Alternatively, a single generic CAD framework modeled as a multiclass problem serves to be useful in such high throughput scenarios, significantly reducing cost, time and manpower. Nevertheless, ambiguities in the overlapping features of multiple classes representing different diseases should be effectively addressed. This paper proposes a segmentation-independent approach based on deep learning (DL) to realize a single framework for the detection of different ocular conditions. The proposed work alleviates the need for pixel-level operations and segmentation techniques specific to different ocular diseases, offering a solution that has an upper hand compared to conventional systems in terms of complexity and accuracy. Further, explainability is incorporated as a value-addition that assures trust and confidence in the model. The system involves automatic feature extraction from full fundus images using Xception, a pre-trained deep model. Xception utilizes depthwise separable convolutions to capture subtle patterns in fundus images, effectively addressing the similarities between clinical indicators, such as drusen in AMD and exudates in DR, which often lead to misdiagnosis. A random over-sampling technique is performed to address class imbalance by equalizing the number of training samples across the classes. These features are fed to extreme gradient boosting (XGB) for classification. This study further aims to unveil the “black box” paradigm of model classification, by leveraging gradient-weighted class activation mapping (Grad-CAM) technique to highlight relevant ROIs. The combination of Xception based feature extraction and XGB classification results in 99.31% accuracy, 99.5% sensitivity, 99.8% specificity, 99.4% F1-score and 99.4% precision. The proposed system can be a promising tool aiding conventional manual screening in primary health care centres and mass screening scenarios for efficiently diagnosing multiple ocular diseases, enhancing personalized and remote eye care, particularly in resource-limited settings. By combining objective performance metrics such as accuracy, sensitivity, and specificity with subjective Grad-CAM visualizations, the system offers a comprehensive evaluation framework, ensuring transparency and building trust in ocular healthcare, making it well-suited for clinical adoption.</div></div>\",\"PeriodicalId\":50374,\"journal\":{\"name\":\"Image and Vision Computing\",\"volume\":\"154 \",\"pages\":\"Article 105419\"},\"PeriodicalIF\":4.2000,\"publicationDate\":\"2025-02-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Image and Vision Computing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0262885625000071\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Image and Vision Computing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0262885625000071","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
Grad-CAM based explanations for multiocular disease detection using Xception net
Age-related macular degeneration (AMD), cataract, diabetic retinopathy (DR) and glaucoma are the four most common ocular conditions that lead to vision loss. Early detection in asymptomatic stages is necessary to alleviate vision loss. Manual diagnosis is costly, tedious, laborious and burdensome; assistive tools such as computer aided diagnosis (CAD) systems can help to alleviate these issues. Existing CAD systems for ocular diseases primarily address a single disease condition, employing disease-specific algorithms that rely on anatomical and morphological characteristics for localization of regions of interest (ROIs). The dependence on exhaustive image processing algorithms for pre-processing, ROI detection and feature extraction often results in overly complex systems prone to errors that affect classifier performance. Conglomerating many such individual diagnostic frameworks, each targeting a single disease, is not a practical solution for detecting multiple ocular diseases, especially in mass screening. Alternatively, a single generic CAD framework modeled as a multiclass problem serves to be useful in such high throughput scenarios, significantly reducing cost, time and manpower. Nevertheless, ambiguities in the overlapping features of multiple classes representing different diseases should be effectively addressed. This paper proposes a segmentation-independent approach based on deep learning (DL) to realize a single framework for the detection of different ocular conditions. The proposed work alleviates the need for pixel-level operations and segmentation techniques specific to different ocular diseases, offering a solution that has an upper hand compared to conventional systems in terms of complexity and accuracy. Further, explainability is incorporated as a value-addition that assures trust and confidence in the model. The system involves automatic feature extraction from full fundus images using Xception, a pre-trained deep model. Xception utilizes depthwise separable convolutions to capture subtle patterns in fundus images, effectively addressing the similarities between clinical indicators, such as drusen in AMD and exudates in DR, which often lead to misdiagnosis. A random over-sampling technique is performed to address class imbalance by equalizing the number of training samples across the classes. These features are fed to extreme gradient boosting (XGB) for classification. This study further aims to unveil the “black box” paradigm of model classification, by leveraging gradient-weighted class activation mapping (Grad-CAM) technique to highlight relevant ROIs. The combination of Xception based feature extraction and XGB classification results in 99.31% accuracy, 99.5% sensitivity, 99.8% specificity, 99.4% F1-score and 99.4% precision. The proposed system can be a promising tool aiding conventional manual screening in primary health care centres and mass screening scenarios for efficiently diagnosing multiple ocular diseases, enhancing personalized and remote eye care, particularly in resource-limited settings. By combining objective performance metrics such as accuracy, sensitivity, and specificity with subjective Grad-CAM visualizations, the system offers a comprehensive evaluation framework, ensuring transparency and building trust in ocular healthcare, making it well-suited for clinical adoption.
期刊介绍:
Image and Vision Computing has as a primary aim the provision of an effective medium of interchange for the results of high quality theoretical and applied research fundamental to all aspects of image interpretation and computer vision. The journal publishes work that proposes new image interpretation and computer vision methodology or addresses the application of such methods to real world scenes. It seeks to strengthen a deeper understanding in the discipline by encouraging the quantitative comparison and performance evaluation of the proposed methodology. The coverage includes: image interpretation, scene modelling, object recognition and tracking, shape analysis, monitoring and surveillance, active vision and robotic systems, SLAM, biologically-inspired computer vision, motion analysis, stereo vision, document image understanding, character and handwritten text recognition, face and gesture recognition, biometrics, vision-based human-computer interaction, human activity and behavior understanding, data fusion from multiple sensor inputs, image databases.