Ramy Abdel Mawgoud, Christian Posch
{"title":"Visual explainability of 250 skin diseases viewed through the eyes of an AI-based, self-supervised vision transformer—A clinical perspective","authors":"Ramy Abdel Mawgoud,&nbsp;Christian Posch","doi":"10.1002/jvc2.580","DOIUrl":null,"url":null,"abstract":"<div>\n \n \n <section>\n \n <h3> Background</h3>\n \n <p>Conventional supervised deep-learning approaches mostly focus on a small range of skin disease images. Recently, self-supervised (SS) Vision Transformers have emerged, capturing complex visual patterns in hundreds of classes without any need for tedious image annotation.</p>\n </section>\n \n <section>\n \n <h3> Objectives</h3>\n \n <p>This study aimed to form the basis for an inexpensive and explainable AI system, targeted at the vastness of clinical skin diagnoses by comparing so-called ‘self-attention maps’ of an SS and a supervised ViT on 250 skin diseases—visualizations showing areas of interest for each skin disease.</p>\n </section>\n \n <section>\n \n <h3> Methods</h3>\n \n <p>Using a public data set containing images of 250 different skin diseases, one small ViT was pretrained S) for 300 epochs (=ViT-SS), and two were fine-tuned supervised from ImageNet-weights for 300 epochs (=ViT-300) and for 78 epochs due to heavier regularization (=ViT-78), respectively. The models generated 250 self-attention maps each. These maps were analyzed in a blinded manner using a ‘DermAttention’ score, and the models were primarily compared based on their ability to focus on disease-relevant features.</p>\n </section>\n \n <section>\n \n <h3> Results</h3>\n \n <p>Visual analysis revealed that ViT-SS delivered superior self-attention-maps. It scored a significantly higher accuracy of focusing on disease-defining lesions (88%; confidence interval [CI] 95%: 0.840–0.920) compared to ViT-300 (78.4%; CI 95%: 0.733–0.835; <i>p</i> &lt; 0.05) and ViT-78 (51.2%; CI 95%: 0.450–0.574; <i>p</i> &lt; 0.05). It also exceeded in other subcategories of ‘DermAttention’.</p>\n </section>\n \n <section>\n \n <h3> Conclusions</h3>\n \n <p>SS pretraining did not translate to better diagnostic performance when compared to conventional supervision. However, it led to more accurate visual representations of varying skin disease images. These findings may pave the way for large-scale, explainable computer-aided skin diagnostic in an unfiltered clinical setting. Further research is needed to improve clinical outcomes using these visual tools.</p>\n </section>\n </div>","PeriodicalId":94325,"journal":{"name":"JEADV clinical practice","volume":"4 1","pages":"145-155"},"PeriodicalIF":0.0000,"publicationDate":"2024-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/jvc2.580","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"JEADV clinical practice","FirstCategoryId":"1085","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/jvc2.580","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

背景 传统的有监督深度学习方法大多集中在小范围的皮肤病图像上。最近,出现了自我监督(SS)视觉变换器,它能捕捉数百类复杂的视觉模式,而无需繁琐的图像标注。 本研究旨在通过比较自监督式视觉变换器和监督式视觉变换器对 250 种皮肤病的所谓 "自我关注图"--显示每种皮肤病的关注区域的可视化图--来为廉价且可解释的人工智能系统奠定基础,以应对庞大的临床皮肤诊断。 方法 使用包含 250 种不同皮肤病图像的公共数据集,对一个小型 ViT 进行了 300 个历时的预训练(=ViT-SS),对两个 ViT 分别进行了 300 个历时的 ImageNet 权重监督微调(=ViT-300)和 78 个历时的重正则化监督微调(=ViT-78)。每个模型生成 250 个自我注意图。使用 "DermAttention "评分对这些地图进行盲法分析,主要根据模型关注疾病相关特征的能力对其进行比较。 结果 视觉分析表明,ViT-SS 提供的自我注意力地图更胜一筹。与 ViT-300(78.4%;CI 95%:0.733-0.835;p <;0.05)和 ViT-78(51.2%;CI 95%:0.450-0.574;p <;0.05)相比,它聚焦于疾病定义病灶的准确率明显更高(88%;置信区间 [CI]95%:0.840-0.920)。在 "DermAttention "的其他子类别中也超过了这一比例。 结论 与传统督导相比,SS 预培训并不能带来更好的诊断效果。但是,它能更准确地视觉呈现不同的皮肤病图像。这些发现可能会为在未经过滤的临床环境中进行大规模、可解释的计算机辅助皮肤诊断铺平道路。要利用这些可视化工具改善临床效果,还需要进一步的研究。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

Visual explainability of 250 skin diseases viewed through the eyes of an AI-based, self-supervised vision transformer—A clinical perspective

Visual explainability of 250 skin diseases viewed through the eyes of an AI-based, self-supervised vision transformer—A clinical perspective

Background

Conventional supervised deep-learning approaches mostly focus on a small range of skin disease images. Recently, self-supervised (SS) Vision Transformers have emerged, capturing complex visual patterns in hundreds of classes without any need for tedious image annotation.

Objectives

This study aimed to form the basis for an inexpensive and explainable AI system, targeted at the vastness of clinical skin diagnoses by comparing so-called ‘self-attention maps’ of an SS and a supervised ViT on 250 skin diseases—visualizations showing areas of interest for each skin disease.

Methods

Using a public data set containing images of 250 different skin diseases, one small ViT was pretrained S) for 300 epochs (=ViT-SS), and two were fine-tuned supervised from ImageNet-weights for 300 epochs (=ViT-300) and for 78 epochs due to heavier regularization (=ViT-78), respectively. The models generated 250 self-attention maps each. These maps were analyzed in a blinded manner using a ‘DermAttention’ score, and the models were primarily compared based on their ability to focus on disease-relevant features.

Results

Visual analysis revealed that ViT-SS delivered superior self-attention-maps. It scored a significantly higher accuracy of focusing on disease-defining lesions (88%; confidence interval [CI] 95%: 0.840–0.920) compared to ViT-300 (78.4%; CI 95%: 0.733–0.835; p < 0.05) and ViT-78 (51.2%; CI 95%: 0.450–0.574; p < 0.05). It also exceeded in other subcategories of ‘DermAttention’.

Conclusions

SS pretraining did not translate to better diagnostic performance when compared to conventional supervision. However, it led to more accurate visual representations of varying skin disease images. These findings may pave the way for large-scale, explainable computer-aided skin diagnostic in an unfiltered clinical setting. Further research is needed to improve clinical outcomes using these visual tools.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
0.30
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信