Diabetic Retinopathy Image Classification Using Shift Window Transformer

IF 1.1 Q4 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

International Journal of Innovative Computing Information and Control Pub Date : 2023-09-13 DOI:10.11113/ijic.v13n1-2.415

Rasha Ali Dihin, Waleed A Mahmoud Al-Jawher, Ebtesam N AlShemmary

{"title":"Diabetic Retinopathy Image Classification Using Shift Window Transformer","authors":"Rasha Ali Dihin, Waleed A Mahmoud Al-Jawher, Ebtesam N AlShemmary","doi":"10.11113/ijic.v13n1-2.415","DOIUrl":null,"url":null,"abstract":"Diabetic retinopathy is one of the most dangerous complications for diabetic patients, leading to blindness if not diagnosed early. However, early diagnosis can control and prevent the disease from progressing to blindness. Transformers are considered state-of-the-art models in natural language processing that do not use convolutional layers. In transformers, means of multi-head attention mechanisms capture long-range contextual relations between pixels. For grading diabetic retinopathy, CNNs currently dominate deep learning solutions. However, the benefits of transformers, have led us to propose an appropriate transformer-based method to recognize diabetic retinopathy grades. A major objective of this research is to demonstrate that the pure attention mechanism can be used to determine diabetic retinopathy and that transformers can replace standard CNNs in identifying the degrees of diabetic retinopathy. In this study, a Swin Transformer-based technique for diagnosing diabetic retinopathy is presented by dividing fundus images into nonoverlapping batches, flattening them, and maintaining positional information using a linear and positional embedding procedure. Several multi-headed attention layers are fed into the resulting sequence to construct the final representation. In the classification step, the initial token sequence is passed into the SoftMax classification layer, which produces the recognition output. This work introduced the Swin transformer performance on the APTOS 2019 Kaggle for training and testing using fundus images of different resolutions and patches. The test accuracy, test loss, and test top 2 accuracies were 69.44%, 1.13, and 78.33%, respectively for 160*160 image size, patch size=2, and embedding dimension C=64. While the test accuracy was 68.85%, test loss: 1.12, and test top 2 accuracy: 79.96% when the patch size=4, and embedding dimension C=96. And when the size image is 224*224, patch size=2, and embedding dimension C=64, the test accuracy: 72.5%, test loss: 1.07, and test top 2 accuracy: 83.7%. When the patch size =4, embedding dimension C=96, the test accuracy was 74.51%, test loss: 1.02, and the test top 2 accuracy was 85.3%. The results showed that the Swin Transformer can achieve flexible memory savings. The proposed method highlights that an attention mechanism based on the Swin Transformer model is promising for the diabetic retinopathy grade recognition task.","PeriodicalId":50314,"journal":{"name":"International Journal of Innovative Computing Information and Control","volume":"67 1","pages":"0"},"PeriodicalIF":1.1000,"publicationDate":"2023-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Innovative Computing Information and Control","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.11113/ijic.v13n1-2.415","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Diabetic retinopathy is one of the most dangerous complications for diabetic patients, leading to blindness if not diagnosed early. However, early diagnosis can control and prevent the disease from progressing to blindness. Transformers are considered state-of-the-art models in natural language processing that do not use convolutional layers. In transformers, means of multi-head attention mechanisms capture long-range contextual relations between pixels. For grading diabetic retinopathy, CNNs currently dominate deep learning solutions. However, the benefits of transformers, have led us to propose an appropriate transformer-based method to recognize diabetic retinopathy grades. A major objective of this research is to demonstrate that the pure attention mechanism can be used to determine diabetic retinopathy and that transformers can replace standard CNNs in identifying the degrees of diabetic retinopathy. In this study, a Swin Transformer-based technique for diagnosing diabetic retinopathy is presented by dividing fundus images into nonoverlapping batches, flattening them, and maintaining positional information using a linear and positional embedding procedure. Several multi-headed attention layers are fed into the resulting sequence to construct the final representation. In the classification step, the initial token sequence is passed into the SoftMax classification layer, which produces the recognition output. This work introduced the Swin transformer performance on the APTOS 2019 Kaggle for training and testing using fundus images of different resolutions and patches. The test accuracy, test loss, and test top 2 accuracies were 69.44%, 1.13, and 78.33%, respectively for 160*160 image size, patch size=2, and embedding dimension C=64. While the test accuracy was 68.85%, test loss: 1.12, and test top 2 accuracy: 79.96% when the patch size=4, and embedding dimension C=96. And when the size image is 224*224, patch size=2, and embedding dimension C=64, the test accuracy: 72.5%, test loss: 1.07, and test top 2 accuracy: 83.7%. When the patch size =4, embedding dimension C=96, the test accuracy was 74.51%, test loss: 1.02, and the test top 2 accuracy was 85.3%. The results showed that the Swin Transformer can achieve flexible memory savings. The proposed method highlights that an attention mechanism based on the Swin Transformer model is promising for the diabetic retinopathy grade recognition task.

查看原文本刊更多论文

利用移位窗变压器对糖尿病视网膜病变图像进行分类

糖尿病视网膜病变是糖尿病患者最危险的并发症之一，如果不及早诊断，可能导致失明。然而，早期诊断可以控制和防止疾病发展为失明。变形金刚被认为是自然语言处理中最先进的模型，不使用卷积层。在变压器中，多头注意机制的手段捕获像素之间的远程上下文关系。对于糖尿病视网膜病变的分级，cnn目前主导着深度学习解决方案。然而，变压器的好处，使我们提出了一种适当的基于变压器的方法来识别糖尿病视网膜病变的等级。本研究的一个主要目的是证明纯注意机制可以用来判断糖尿病视网膜病变，并且transformer可以取代标准cnn来识别糖尿病视网膜病变的程度。在这项研究中，提出了一种基于Swin变压器的诊断糖尿病视网膜病变的技术，该技术将眼底图像划分为不重叠的批，使其平坦，并使用线性和位置嵌入程序维护位置信息。将多个多头注意层输入到生成的序列中以构建最终的表示。在分类步骤中，将初始标记序列传递给SoftMax分类层，由SoftMax分类层产生识别输出。本文介绍了Swin变压器在APTOS 2019 Kaggle上的性能，使用不同分辨率和补丁的眼底图像进行训练和测试。当图像尺寸为160*160,patch尺寸为2，嵌入维数C=64时，测试准确率为69.44%，测试损失为1.13%，测试前2位准确率为78.33%。当patch大小=4，嵌入维数C=96时，测试准确率为68.85%，测试损失为1.12，测试前2准确率为79.96%。当图像尺寸为224*224,patch尺寸为2，嵌入维数C=64时，测试准确率为72.5%，测试损失为1.07，测试前2准确率为83.7%。当贴片大小=4，嵌入维数C=96时，测试准确率为74.51%，测试损失为1.02，测试前2名准确率为85.3%。结果表明，Swin变压器可以实现灵活的内存节省。该方法强调了一种基于Swin Transformer模型的注意机制在糖尿病视网膜病变等级识别任务中的应用前景。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

International Journal of Innovative Computing Information and Control 工程技术-计算机：人工智能

CiteScore

3.20

自引率

20.00%

发文量

审稿时长

4.3 months

期刊介绍： The primary aim of the International Journal of Innovative Computing, Information and Control (IJICIC) is to publish high-quality papers of new developments and trends, novel techniques and approaches, innovative methodologies and technologies on the theory and applications of intelligent systems, information and control. The IJICIC is a peer-reviewed English language journal and is published bimonthly