MaxGlaViT：一种基于眼底图像的轻型视觉转换器的青光眼早期诊断方法

IF 2.5 4区计算机科学 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC

International Journal of Imaging Systems and Technology Pub Date : 2025-07-19 DOI:10.1002/ima.70159

Mustafa Yurdakul, Kübra Uyar, Şakir Taşdemir

{"title":"MaxGlaViT：一种基于眼底图像的轻型视觉转换器的青光眼早期诊断方法","authors":"Mustafa Yurdakul, Kübra Uyar, Şakir Taşdemir","doi":"10.1002/ima.70159","DOIUrl":null,"url":null,"abstract":"<div>\n \n <p>Glaucoma is a prevalent eye disease that often progresses without symptoms and can lead to permanent vision loss if not detected early. The limited number of specialists and overcrowded clinics worldwide make it difficult to detect the disease at an early stage. Deep learning-based computer-aided diagnosis (CAD) systems are a solution to this problem, enabling faster and more accurate diagnosis. In this study, we proposed MaxGlaViT, a novel Vision Transformer model based on MaxViT to diagnose different stages of glaucoma. The architecture of the model is constructed in three steps: (i) the Multi Axis Vision Transformer (MaxViT) structure is scaled in terms of the number of blocks and channels, (ii) low-level feature extraction is improved by integrating the attention mechanism into the stem block, and (iii) high-level feature extraction is improved by using the modern convolutional structure. The MaxGlaViT model was tested on the HDV1 fundus image data set and compared to a total of 80 deep learning models. The results show that the MaxGlaViT model, which contains effective block structures, outperforms previous literature methods in terms of both parameter efficiency and classification accuracy. The model performs particularly high success in detecting the early stages of glaucoma. MaxGlaViT is an effective solution for multistage diagnosis of glaucoma with low computational cost and high accuracy. In this respect, it can be considered as a candidate for a scalable and reliable CAD system applicable in clinical settings.</p>\n </div>","PeriodicalId":14027,"journal":{"name":"International Journal of Imaging Systems and Technology","volume":"35 4","pages":""},"PeriodicalIF":2.5000,"publicationDate":"2025-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"MaxGlaViT: A Novel Lightweight Vision Transformer-Based Approach for Early Diagnosis of Glaucoma Stages From Fundus Images\",\"authors\":\"Mustafa Yurdakul, Kübra Uyar, Şakir Taşdemir\",\"doi\":\"10.1002/ima.70159\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div>\\n \\n <p>Glaucoma is a prevalent eye disease that often progresses without symptoms and can lead to permanent vision loss if not detected early. The limited number of specialists and overcrowded clinics worldwide make it difficult to detect the disease at an early stage. Deep learning-based computer-aided diagnosis (CAD) systems are a solution to this problem, enabling faster and more accurate diagnosis. In this study, we proposed MaxGlaViT, a novel Vision Transformer model based on MaxViT to diagnose different stages of glaucoma. The architecture of the model is constructed in three steps: (i) the Multi Axis Vision Transformer (MaxViT) structure is scaled in terms of the number of blocks and channels, (ii) low-level feature extraction is improved by integrating the attention mechanism into the stem block, and (iii) high-level feature extraction is improved by using the modern convolutional structure. The MaxGlaViT model was tested on the HDV1 fundus image data set and compared to a total of 80 deep learning models. The results show that the MaxGlaViT model, which contains effective block structures, outperforms previous literature methods in terms of both parameter efficiency and classification accuracy. The model performs particularly high success in detecting the early stages of glaucoma. MaxGlaViT is an effective solution for multistage diagnosis of glaucoma with low computational cost and high accuracy. In this respect, it can be considered as a candidate for a scalable and reliable CAD system applicable in clinical settings.</p>\\n </div>\",\"PeriodicalId\":14027,\"journal\":{\"name\":\"International Journal of Imaging Systems and Technology\",\"volume\":\"35 4\",\"pages\":\"\"},\"PeriodicalIF\":2.5000,\"publicationDate\":\"2025-07-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Imaging Systems and Technology\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1002/ima.70159\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Imaging Systems and Technology","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/ima.70159","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

摘要

青光眼是一种常见的眼部疾病，通常在没有症状的情况下发展，如果不及早发现，可能导致永久性视力丧失。世界各地的专家数量有限，诊所人满为患，因此很难在早期发现这种疾病。基于深度学习的计算机辅助诊断（CAD）系统解决了这一问题，实现了更快、更准确的诊断。在这项研究中，我们提出了一种新的基于MaxViT的视觉转换模型MaxGlaViT，用于诊断青光眼的不同阶段。该模型的体系结构分为三个步骤：(i)根据块和通道的数量对多轴视觉变压器（MaxViT）结构进行缩放，（ii）通过将注意力机制集成到干块中来改进低级特征提取，（iii）使用现代卷积结构改进高级特征提取。MaxGlaViT模型在HDV1眼底图像数据集上进行了测试，并与总共80个深度学习模型进行了比较。结果表明，包含有效块结构的MaxGlaViT模型在参数效率和分类精度方面都优于文献中已有的方法。该模型在检测青光眼早期阶段的成功率特别高。MaxGlaViT计算成本低，准确率高，是青光眼多阶段诊断的有效解决方案。在这方面，它可以被认为是适用于临床环境的可扩展和可靠的CAD系统的候选。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

MaxGlaViT: A Novel Lightweight Vision Transformer-Based Approach for Early Diagnosis of Glaucoma Stages From Fundus Images

Glaucoma is a prevalent eye disease that often progresses without symptoms and can lead to permanent vision loss if not detected early. The limited number of specialists and overcrowded clinics worldwide make it difficult to detect the disease at an early stage. Deep learning-based computer-aided diagnosis (CAD) systems are a solution to this problem, enabling faster and more accurate diagnosis. In this study, we proposed MaxGlaViT, a novel Vision Transformer model based on MaxViT to diagnose different stages of glaucoma. The architecture of the model is constructed in three steps: (i) the Multi Axis Vision Transformer (MaxViT) structure is scaled in terms of the number of blocks and channels, (ii) low-level feature extraction is improved by integrating the attention mechanism into the stem block, and (iii) high-level feature extraction is improved by using the modern convolutional structure. The MaxGlaViT model was tested on the HDV1 fundus image data set and compared to a total of 80 deep learning models. The results show that the MaxGlaViT model, which contains effective block structures, outperforms previous literature methods in terms of both parameter efficiency and classification accuracy. The model performs particularly high success in detecting the early stages of glaucoma. MaxGlaViT is an effective solution for multistage diagnosis of glaucoma with low computational cost and high accuracy. In this respect, it can be considered as a candidate for a scalable and reliable CAD system applicable in clinical settings.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

International Journal of Imaging Systems and Technology 工程技术-成像科学与照相技术

CiteScore

6.90

自引率

6.10%

发文量

138

审稿时长

3 months

期刊介绍： The International Journal of Imaging Systems and Technology (IMA) is a forum for the exchange of ideas and results relevant to imaging systems, including imaging physics and informatics. The journal covers all imaging modalities in humans and animals. IMA accepts technically sound and scientifically rigorous research in the interdisciplinary field of imaging, including relevant algorithmic research and hardware and software development, and their applications relevant to medical research. The journal provides a platform to publish original research in structural and functional imaging. The journal is also open to imaging studies of the human body and on animals that describe novel diagnostic imaging and analyses methods. Technical, theoretical, and clinical research in both normal and clinical populations is encouraged. Submissions describing methods, software, databases, replication studies as well as negative results are also considered. The scope of the journal includes, but is not limited to, the following in the context of biomedical research: Imaging and neuro-imaging modalities: structural MRI, functional MRI, PET, SPECT, CT, ultrasound, EEG, MEG, NIRS etc.; Neuromodulation and brain stimulation techniques such as TMS and tDCS; Software and hardware for imaging, especially related to human and animal health; Image segmentation in normal and clinical populations; Pattern analysis and classification using machine learning techniques; Computational modeling and analysis; Brain connectivity and connectomics; Systems-level characterization of brain function; Neural networks and neurorobotics; Computer vision, based on human/animal physiology; Brain-computer interface (BCI) technology; Big data, databasing and data mining.