EffiViT: Hybrid CNN-Transformer for Retinal Imaging

IF 7 2区 医学 Q1 BIOLOGY
Rajatha , D.V. Ashoka
{"title":"EffiViT: Hybrid CNN-Transformer for Retinal Imaging","authors":"Rajatha ,&nbsp;D.V. Ashoka","doi":"10.1016/j.compbiomed.2025.110164","DOIUrl":null,"url":null,"abstract":"<div><div>The human eye is a vital sensory organ that is crucial for visual perception. The retina is the main component of the eye and is responsible for visual signals. Due to its characteristics, the retina can reveal the occurrence of ocular diseases. So, early detection and automated diagnosis of retinal disease are crucial for preventing both temporary and permanent blindness.</div><div>In the proposed work, a comprehensive framework is introduced, meticulously designed to leverage the synergic strengths of EfficientNet-B4 and Vision Transformers for attention-driven sophisticated analysis, offering a promising tool for advanced ophthalmic healthcare. This framework transcends the conventional hybridization by embedding the EfficientNetB4 reimagined as the multiscale feature encoder, creating discriminative feature maps preserving both local and intermediate contextual information. Then, Vision Transformer are incorporated to capitalize on the attention mechanisms to capture and model the global dependencies effectively. This combination establishes a sophisticated paradigm for capturing intricate patterns, focusing on the pertinent factors of the image, enabling precise and reliable classification.</div><div>It is seen that the proposed model achieved a significant advancement by scoring an AUC of 0.9466, mAP of 0.7865, F1-score of 0.75 and Model Score of 0.8665. The framework achieved a remarkable 5.17% increase in the overall score when compared to the previous cutting-edge technologies on the same task. This improvement underscores the effectiveness of the hybrid model in identifying both local and global contextual information, making it a robust and reliable tool for precise retinal diagnosis.</div></div>","PeriodicalId":10578,"journal":{"name":"Computers in biology and medicine","volume":"191 ","pages":"Article 110164"},"PeriodicalIF":7.0000,"publicationDate":"2025-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers in biology and medicine","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0010482525005153","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

The human eye is a vital sensory organ that is crucial for visual perception. The retina is the main component of the eye and is responsible for visual signals. Due to its characteristics, the retina can reveal the occurrence of ocular diseases. So, early detection and automated diagnosis of retinal disease are crucial for preventing both temporary and permanent blindness.
In the proposed work, a comprehensive framework is introduced, meticulously designed to leverage the synergic strengths of EfficientNet-B4 and Vision Transformers for attention-driven sophisticated analysis, offering a promising tool for advanced ophthalmic healthcare. This framework transcends the conventional hybridization by embedding the EfficientNetB4 reimagined as the multiscale feature encoder, creating discriminative feature maps preserving both local and intermediate contextual information. Then, Vision Transformer are incorporated to capitalize on the attention mechanisms to capture and model the global dependencies effectively. This combination establishes a sophisticated paradigm for capturing intricate patterns, focusing on the pertinent factors of the image, enabling precise and reliable classification.
It is seen that the proposed model achieved a significant advancement by scoring an AUC of 0.9466, mAP of 0.7865, F1-score of 0.75 and Model Score of 0.8665. The framework achieved a remarkable 5.17% increase in the overall score when compared to the previous cutting-edge technologies on the same task. This improvement underscores the effectiveness of the hybrid model in identifying both local and global contextual information, making it a robust and reliable tool for precise retinal diagnosis.
EffiViT:混合cnn -变压器视网膜成像
人眼是一个重要的感觉器官,对视觉感知起着至关重要的作用。视网膜是眼睛的主要组成部分,负责视觉信号。由于视网膜的特点,它可以揭示眼部疾病的发生。因此,视网膜疾病的早期检测和自动诊断对于预防暂时性和永久性失明至关重要。在建议的工作中,引入了一个全面的框架,精心设计以利用EfficientNet-B4和Vision Transformers的协同优势进行注意力驱动的复杂分析,为先进的眼科保健提供了一个有前途的工具。该框架通过嵌入多尺度特征编码器的effentnetb4,超越了传统的杂交,创建了保留本地和中间上下文信息的判别特征映射。然后,结合Vision Transformer来利用注意力机制来有效地捕获和建模全局依赖关系。这种组合为捕获复杂的模式建立了一个复杂的范例,专注于图像的相关因素,实现精确和可靠的分类。可以看出,该模型的AUC为0.9466,mAP为0.7865,f1得分为0.75,model Score为0.8665,取得了显著的进步。与之前的前沿技术相比,该框架在同样的任务中取得了5.17%的显著成绩。这一改进强调了混合模型在识别局部和全局上下文信息方面的有效性,使其成为精确视网膜诊断的稳健可靠的工具。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Computers in biology and medicine
Computers in biology and medicine 工程技术-工程:生物医学
CiteScore
11.70
自引率
10.40%
发文量
1086
审稿时长
74 days
期刊介绍: Computers in Biology and Medicine is an international forum for sharing groundbreaking advancements in the use of computers in bioscience and medicine. This journal serves as a medium for communicating essential research, instruction, ideas, and information regarding the rapidly evolving field of computer applications in these domains. By encouraging the exchange of knowledge, we aim to facilitate progress and innovation in the utilization of computers in biology and medicine.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信