{"title":"OcuViT: A Vision Transformer-Based Approach for Automated Diabetic Retinopathy and AMD Classification.","authors":"Faisal Ahmed, M D Joshem Uddin","doi":"10.1007/s10278-025-01676-3","DOIUrl":null,"url":null,"abstract":"<p><p>Early detection and accurate classification of retinal diseases, such as diabetic retinopathy (DR) and age-related macular degeneration (AMD), are essential to preventing vision loss and improving patient outcomes. Traditional methods for analyzing retinal fundus images are often manual, prolonged, and rely on the expertise of the clinician, leading to delays in diagnosis and treatment. Recent advances in machine learning, particularly deep learning, have introduced automated systems to assist in retinal disease detection; however, challenges such as computational inefficiency and robustness still remain. This paper proposes a novel approach that utilizes vision transformers (ViT) through transfer learning to address challenges in ophthalmic diagnostics. Using a pre-trained ViT-Base-Patch16-224 model, we fine-tune it for diabetic retinopathy (DR) and age-related macular degeneration (AMD) classification tasks. To adapt the model for retinal fundus images, we implement a streamlined preprocessing pipeline that converts the images into PyTorch tensors and standardizes them, ensuring compatibility with the ViT architecture and improving model performance. We validated our model, OcuViT, on two datasets. We used the APTOS dataset to perform binary and five-level severity classification and the IChallenge-AMD dataset for grading age-related macular degeneration (AMD). In the five-class DR and AMD grading tasks, OcuViT outperforms all existing CNN- and ViT-based methods across multiple metrics, achieving superior accuracy and robustness. For the binary DR task, it delivers highly competitive performance. These results demonstrate that OcuViT effectively leverages ViT-based transfer learning with an efficient preprocessing pipeline, significantly improving the precision and reliability of automated ophthalmic diagnosis.</p>","PeriodicalId":516858,"journal":{"name":"Journal of imaging informatics in medicine","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2025-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of imaging informatics in medicine","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s10278-025-01676-3","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Early detection and accurate classification of retinal diseases, such as diabetic retinopathy (DR) and age-related macular degeneration (AMD), are essential to preventing vision loss and improving patient outcomes. Traditional methods for analyzing retinal fundus images are often manual, prolonged, and rely on the expertise of the clinician, leading to delays in diagnosis and treatment. Recent advances in machine learning, particularly deep learning, have introduced automated systems to assist in retinal disease detection; however, challenges such as computational inefficiency and robustness still remain. This paper proposes a novel approach that utilizes vision transformers (ViT) through transfer learning to address challenges in ophthalmic diagnostics. Using a pre-trained ViT-Base-Patch16-224 model, we fine-tune it for diabetic retinopathy (DR) and age-related macular degeneration (AMD) classification tasks. To adapt the model for retinal fundus images, we implement a streamlined preprocessing pipeline that converts the images into PyTorch tensors and standardizes them, ensuring compatibility with the ViT architecture and improving model performance. We validated our model, OcuViT, on two datasets. We used the APTOS dataset to perform binary and five-level severity classification and the IChallenge-AMD dataset for grading age-related macular degeneration (AMD). In the five-class DR and AMD grading tasks, OcuViT outperforms all existing CNN- and ViT-based methods across multiple metrics, achieving superior accuracy and robustness. For the binary DR task, it delivers highly competitive performance. These results demonstrate that OcuViT effectively leverages ViT-based transfer learning with an efficient preprocessing pipeline, significantly improving the precision and reliability of automated ophthalmic diagnosis.