{"title":"EffiViT: Hybrid CNN-Transformer for Retinal Imaging","authors":"Rajatha , D.V. Ashoka","doi":"10.1016/j.compbiomed.2025.110164","DOIUrl":null,"url":null,"abstract":"<div><div>The human eye is a vital sensory organ that is crucial for visual perception. The retina is the main component of the eye and is responsible for visual signals. Due to its characteristics, the retina can reveal the occurrence of ocular diseases. So, early detection and automated diagnosis of retinal disease are crucial for preventing both temporary and permanent blindness.</div><div>In the proposed work, a comprehensive framework is introduced, meticulously designed to leverage the synergic strengths of EfficientNet-B4 and Vision Transformers for attention-driven sophisticated analysis, offering a promising tool for advanced ophthalmic healthcare. This framework transcends the conventional hybridization by embedding the EfficientNetB4 reimagined as the multiscale feature encoder, creating discriminative feature maps preserving both local and intermediate contextual information. Then, Vision Transformer are incorporated to capitalize on the attention mechanisms to capture and model the global dependencies effectively. This combination establishes a sophisticated paradigm for capturing intricate patterns, focusing on the pertinent factors of the image, enabling precise and reliable classification.</div><div>It is seen that the proposed model achieved a significant advancement by scoring an AUC of 0.9466, mAP of 0.7865, F1-score of 0.75 and Model Score of 0.8665. The framework achieved a remarkable 5.17% increase in the overall score when compared to the previous cutting-edge technologies on the same task. This improvement underscores the effectiveness of the hybrid model in identifying both local and global contextual information, making it a robust and reliable tool for precise retinal diagnosis.</div></div>","PeriodicalId":10578,"journal":{"name":"Computers in biology and medicine","volume":"191 ","pages":"Article 110164"},"PeriodicalIF":7.0000,"publicationDate":"2025-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers in biology and medicine","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0010482525005153","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
The human eye is a vital sensory organ that is crucial for visual perception. The retina is the main component of the eye and is responsible for visual signals. Due to its characteristics, the retina can reveal the occurrence of ocular diseases. So, early detection and automated diagnosis of retinal disease are crucial for preventing both temporary and permanent blindness.
In the proposed work, a comprehensive framework is introduced, meticulously designed to leverage the synergic strengths of EfficientNet-B4 and Vision Transformers for attention-driven sophisticated analysis, offering a promising tool for advanced ophthalmic healthcare. This framework transcends the conventional hybridization by embedding the EfficientNetB4 reimagined as the multiscale feature encoder, creating discriminative feature maps preserving both local and intermediate contextual information. Then, Vision Transformer are incorporated to capitalize on the attention mechanisms to capture and model the global dependencies effectively. This combination establishes a sophisticated paradigm for capturing intricate patterns, focusing on the pertinent factors of the image, enabling precise and reliable classification.
It is seen that the proposed model achieved a significant advancement by scoring an AUC of 0.9466, mAP of 0.7865, F1-score of 0.75 and Model Score of 0.8665. The framework achieved a remarkable 5.17% increase in the overall score when compared to the previous cutting-edge technologies on the same task. This improvement underscores the effectiveness of the hybrid model in identifying both local and global contextual information, making it a robust and reliable tool for precise retinal diagnosis.
期刊介绍:
Computers in Biology and Medicine is an international forum for sharing groundbreaking advancements in the use of computers in bioscience and medicine. This journal serves as a medium for communicating essential research, instruction, ideas, and information regarding the rapidly evolving field of computer applications in these domains. By encouraging the exchange of knowledge, we aim to facilitate progress and innovation in the utilization of computers in biology and medicine.