Fernando Korn Malerbi, Luis Filipe Nakayama, Paulo Prado, Fernando Yamanaka, Gustavo Barreto Melo, Caio Vinicius Regatieri, José Augusto Stuchi
{"title":"糖尿病视网膜病变检测中人工智能可解释性的热图分析:阐明深度学习决策的合理性。","authors":"Fernando Korn Malerbi, Luis Filipe Nakayama, Paulo Prado, Fernando Yamanaka, Gustavo Barreto Melo, Caio Vinicius Regatieri, José Augusto Stuchi","doi":"10.21037/atm-24-73","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>The opaqueness of artificial intelligence (AI) algorithms decision processes limit their application in healthcare. Our objective was to explore discrepancies in heatmaps originated from slightly different retinal images from the same eyes of individuals with diabetes, to gain insights into the deep learning (DL) decision process.</p><p><strong>Methods: </strong>Pairs of retinal images from the same eyes of individuals with diabetes, composed of images obtained before and after pupil dilation, underwent automatic analysis by a convolutional neural network for the presence of diabetic retinopathy (DR), output being a score ranging from 0 to 1. Gradient-based Class Activation Maps (GradCam) allowed visualization of activated areas. Pairs of images with discordant DL scores or outputs within the pair were objectively compared to the concordant pairs, regarding the sum of activations of Class Activation Mapping (CAM), the number of activated areas, and DL score differences. Heatmaps of discordant pairs were also qualitatively assessed.</p><p><strong>Results: </strong>Algorithmic performance for the detection of DR attained 89.8% sensitivity, 96.3% specificity and area under the receiver operating characteristic (ROC) curve of 0.95. Out of 210 comparable pairs of images, 20 eyes and 10 eyes were considered discordant according to DL score difference and regarding DL output, respectively. Comparison of concordant versus discordant groups showed statistically significant differences for all objective variables. Qualitative analysis pointed to subtle differences in image quality within discordant pairs.</p><p><strong>Conclusions: </strong>The successfully established relationship among objective parameters extracted from heatmaps and DL output discrepancies reinforces the role of heatmaps for DL explainability, fostering acceptance of DL systems for clinical use.</p>","PeriodicalId":8216,"journal":{"name":"Annals of translational medicine","volume":"12 5","pages":"89"},"PeriodicalIF":0.0000,"publicationDate":"2024-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11534741/pdf/","citationCount":"0","resultStr":"{\"title\":\"Heatmap analysis for artificial intelligence explainability in diabetic retinopathy detection: illuminating the rationale of deep learning decisions.\",\"authors\":\"Fernando Korn Malerbi, Luis Filipe Nakayama, Paulo Prado, Fernando Yamanaka, Gustavo Barreto Melo, Caio Vinicius Regatieri, José Augusto Stuchi\",\"doi\":\"10.21037/atm-24-73\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>The opaqueness of artificial intelligence (AI) algorithms decision processes limit their application in healthcare. Our objective was to explore discrepancies in heatmaps originated from slightly different retinal images from the same eyes of individuals with diabetes, to gain insights into the deep learning (DL) decision process.</p><p><strong>Methods: </strong>Pairs of retinal images from the same eyes of individuals with diabetes, composed of images obtained before and after pupil dilation, underwent automatic analysis by a convolutional neural network for the presence of diabetic retinopathy (DR), output being a score ranging from 0 to 1. Gradient-based Class Activation Maps (GradCam) allowed visualization of activated areas. Pairs of images with discordant DL scores or outputs within the pair were objectively compared to the concordant pairs, regarding the sum of activations of Class Activation Mapping (CAM), the number of activated areas, and DL score differences. Heatmaps of discordant pairs were also qualitatively assessed.</p><p><strong>Results: </strong>Algorithmic performance for the detection of DR attained 89.8% sensitivity, 96.3% specificity and area under the receiver operating characteristic (ROC) curve of 0.95. Out of 210 comparable pairs of images, 20 eyes and 10 eyes were considered discordant according to DL score difference and regarding DL output, respectively. Comparison of concordant versus discordant groups showed statistically significant differences for all objective variables. Qualitative analysis pointed to subtle differences in image quality within discordant pairs.</p><p><strong>Conclusions: </strong>The successfully established relationship among objective parameters extracted from heatmaps and DL output discrepancies reinforces the role of heatmaps for DL explainability, fostering acceptance of DL systems for clinical use.</p>\",\"PeriodicalId\":8216,\"journal\":{\"name\":\"Annals of translational medicine\",\"volume\":\"12 5\",\"pages\":\"89\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-10-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11534741/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Annals of translational medicine\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.21037/atm-24-73\",\"RegionNum\":4,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2024/10/12 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Annals of translational medicine","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.21037/atm-24-73","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/10/12 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}
Heatmap analysis for artificial intelligence explainability in diabetic retinopathy detection: illuminating the rationale of deep learning decisions.
Background: The opaqueness of artificial intelligence (AI) algorithms decision processes limit their application in healthcare. Our objective was to explore discrepancies in heatmaps originated from slightly different retinal images from the same eyes of individuals with diabetes, to gain insights into the deep learning (DL) decision process.
Methods: Pairs of retinal images from the same eyes of individuals with diabetes, composed of images obtained before and after pupil dilation, underwent automatic analysis by a convolutional neural network for the presence of diabetic retinopathy (DR), output being a score ranging from 0 to 1. Gradient-based Class Activation Maps (GradCam) allowed visualization of activated areas. Pairs of images with discordant DL scores or outputs within the pair were objectively compared to the concordant pairs, regarding the sum of activations of Class Activation Mapping (CAM), the number of activated areas, and DL score differences. Heatmaps of discordant pairs were also qualitatively assessed.
Results: Algorithmic performance for the detection of DR attained 89.8% sensitivity, 96.3% specificity and area under the receiver operating characteristic (ROC) curve of 0.95. Out of 210 comparable pairs of images, 20 eyes and 10 eyes were considered discordant according to DL score difference and regarding DL output, respectively. Comparison of concordant versus discordant groups showed statistically significant differences for all objective variables. Qualitative analysis pointed to subtle differences in image quality within discordant pairs.
Conclusions: The successfully established relationship among objective parameters extracted from heatmaps and DL output discrepancies reinforces the role of heatmaps for DL explainability, fostering acceptance of DL systems for clinical use.
期刊介绍:
The Annals of Translational Medicine (Ann Transl Med; ATM; Print ISSN 2305-5839; Online ISSN 2305-5847) is an international, peer-reviewed Open Access journal featuring original and observational investigations in the broad fields of laboratory, clinical, and public health research, aiming to provide practical up-to-date information in significant research from all subspecialties of medicine and to broaden the readers’ vision and horizon from bench to bed and bed to bench. It is published quarterly (April 2013- Dec. 2013), monthly (Jan. 2014 - Feb. 2015), biweekly (March 2015-) and openly distributed worldwide. Annals of Translational Medicine is indexed in PubMed in Sept 2014 and in SCIE in 2018. Specific areas of interest include, but not limited to, multimodality therapy, epidemiology, biomarkers, imaging, biology, pathology, and technical advances related to medicine. Submissions describing preclinical research with potential for application to human disease, and studies describing research obtained from preliminary human experimentation with potential to further the understanding of biological mechanism underlying disease are encouraged. Also warmly welcome are studies describing public health research pertinent to clinic, disease diagnosis and prevention, or healthcare policy. With a focus on interdisciplinary academic cooperation, ATM aims to expedite the translation of scientific discovery into new or improved standards of management and health outcomes practice.