Diabetic Retinopathy Detection using CNN, Transformer and MLP based Architectures

2021 International Symposium on Intelligent Signal Processing and Communication Systems (ISPACS) Pub Date : 2021-11-16 DOI:10.1109/ISPACS51563.2021.9651024

N. S. Kumar, Badri Karthikeyan

{"title":"Diabetic Retinopathy Detection using CNN, Transformer and MLP based Architectures","authors":"N. S. Kumar, Badri Karthikeyan","doi":"10.1109/ISPACS51563.2021.9651024","DOIUrl":null,"url":null,"abstract":"Diabetic retinopathy is a chronic disease caused due to a long term accumulation of insulin in the retinal blood vessels. 2.6% of global blindness is a result of diabetic retinopathy (DR) with more than 150 million people affected. Early detection of DR plays an important role in preventing blindness. Use of deep learning is a long term solution to screen, diagnose and monitor patients within primary health centers. Attention based networks (Transformers), Convolutional neural networks (CNN) and multi-layered perceptrons (MPLs) are the current state-of-the-art architectures for addressing computer vision based problem statements. In this paper, we evaluate these three different architectures for the detection of DR. Model convegence time (training time), accuracy, model size are few of the metrics that have been used for this evaluation. State-of-the-art pre-trained models belonging to each of these architectures have been chosen for these experiments. The models include EfficientNet, ResNet, Swin-Transformer, Vision-Transformer (ViT) and MLP-Mixer. These models have been trained using Kaggle dataset, which contains more than 3600 annotated images with a resolution of 2416*1736. For fair comparisons, no augmentation techniques have been used to improve the performance. Results of the experiments indicate that the models based on Transformer based architecture are the most accurate and also have comparative model-convergence times compared to CNN and MLP architectures. Among all the state-of-the-art pre-trained models Swin-Transformer yields the best accuracy of 86.4% on test dataset and it takes around 12 minutes for training the model on a Tesla K80 GPU.","PeriodicalId":359822,"journal":{"name":"2021 International Symposium on Intelligent Signal Processing and Communication Systems (ISPACS)","volume":"168 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 International Symposium on Intelligent Signal Processing and Communication Systems (ISPACS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISPACS51563.2021.9651024","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 7

Abstract

Diabetic retinopathy is a chronic disease caused due to a long term accumulation of insulin in the retinal blood vessels. 2.6% of global blindness is a result of diabetic retinopathy (DR) with more than 150 million people affected. Early detection of DR plays an important role in preventing blindness. Use of deep learning is a long term solution to screen, diagnose and monitor patients within primary health centers. Attention based networks (Transformers), Convolutional neural networks (CNN) and multi-layered perceptrons (MPLs) are the current state-of-the-art architectures for addressing computer vision based problem statements. In this paper, we evaluate these three different architectures for the detection of DR. Model convegence time (training time), accuracy, model size are few of the metrics that have been used for this evaluation. State-of-the-art pre-trained models belonging to each of these architectures have been chosen for these experiments. The models include EfficientNet, ResNet, Swin-Transformer, Vision-Transformer (ViT) and MLP-Mixer. These models have been trained using Kaggle dataset, which contains more than 3600 annotated images with a resolution of 2416*1736. For fair comparisons, no augmentation techniques have been used to improve the performance. Results of the experiments indicate that the models based on Transformer based architecture are the most accurate and also have comparative model-convergence times compared to CNN and MLP architectures. Among all the state-of-the-art pre-trained models Swin-Transformer yields the best accuracy of 86.4% on test dataset and it takes around 12 minutes for training the model on a Tesla K80 GPU.

查看原文本刊更多论文

基于CNN、Transformer和MLP架构的糖尿病视网膜病变检测

糖尿病视网膜病变是由于胰岛素在视网膜血管中长期积累而引起的一种慢性疾病。全球2.6%的失明是由糖尿病视网膜病变(DR)造成的，有超过1.5亿人受到影响。DR的早期发现对预防失明起着重要作用。使用深度学习是筛查、诊断和监测初级卫生中心患者的长期解决方案。基于注意力的网络(Transformers)、卷积神经网络(CNN)和多层感知器(MPLs)是当前最先进的架构，用于解决基于计算机视觉的问题陈述。在本文中，我们评估了这三种不同的dr检测体系结构，模型的收敛时间(训练时间)、精度、模型大小是用于该评估的少数指标。这些实验选择了属于这些架构的最先进的预训练模型。这些模型包括EfficientNet、ResNet、swing - transformer、Vision-Transformer (ViT)和MLP-Mixer。这些模型使用Kaggle数据集进行训练，该数据集包含3600多张带注释的图像，分辨率为2416*1736。为了公平比较，没有使用增强技术来提高性能。实验结果表明，与CNN和MLP架构相比，基于Transformer架构的模型精度最高，并且模型收敛时间也比较短。在所有最先进的预训练模型中，swing - transformer在测试数据集上的准确率最高，为86.4%，在特斯拉K80 GPU上训练模型大约需要12分钟。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2021 International Symposium on Intelligent Signal Processing and Communication Systems (ISPACS)

自引率

0.00%

发文量