How Foundational Is the Retina Foundation Model? Estimating RETFound’s Label Efficiency on Binary Classification of Normal versus Abnormal OCT Images

IF 3.2 Q1 OPHTHALMOLOGY

Ophthalmology science Pub Date : 2025-01-11 DOI:10.1016/j.xops.2025.100707

David Kuo MD , Qitong Gao PhD , Dev Patel MS , Miroslav Pajic PhD , Majda Hadziahmetovic MD

{"title":"How Foundational Is the Retina Foundation Model? Estimating RETFound’s Label Efficiency on Binary Classification of Normal versus Abnormal OCT Images","authors":"David Kuo MD , Qitong Gao PhD , Dev Patel MS , Miroslav Pajic PhD , Majda Hadziahmetovic MD","doi":"10.1016/j.xops.2025.100707","DOIUrl":null,"url":null,"abstract":"<div><h3>Objective</h3><div>While the availability of public internet-scale datasets of images and language has catalyzed remarkable progress in machine learning, medical datasets are constrained by regulations protecting patient privacy and the time and cost required for curation and labeling. Self-supervised learning or pretraining has demonstrated great success in learning meaningful representations from large unlabeled datasets to enable efficient learning on downstream tasks. In ophthalmology, the RETFound model, a large vision transformer (ViT-L) model trained by masked autoencoding on 1.6 million color fundus photos and OCT B-scans, is the first model pretrained at such scale for ophthalmology, demonstrating strong performance on downstream tasks from diabetic retinopathy grading to stroke detection. Here, we measure the label efficiency of the RETFound model in learning to identify normal vs. abnormal OCT B-scans obtained as part of a pilot study for primary care-based diabetic retinopathy screening in North Carolina.</div></div><div><h3>Design</h3><div>The 1150 TopCon Maestro OCT central B-scans (981 normal and 169 abnormal) were randomly split 80/10/10 into training, validation, and test datasets. Model training and hyperparameter tuning were performed on the training set guided by validation set performance. The best performing models were then evaluated on the final test set.</div></div><div><h3>Subjects</h3><div>Six hundred forty-seven patients with diabetes in the Duke Health System participating in primary care diabetic retinopathy screening contributed 1150 TopCon Maestro OCT central B-scans.</div></div><div><h3>Methods</h3><div>Three models (ResNet-50, ViT-L, and RETFound) were fine-tuned on the full training dataset of 915 OCT B-scans and on smaller training data subsets of 500, 250, 100, and 50 OCT B-scans, respectively, across 3 random seeds.</div></div><div><h3>Main Outcome Measures</h3><div>Mean accuracy, area under the receiver operator curve (AUROC), area under the precision recall curve (AUPRC), F1 score, precision, and recall on the final held-out test set were reported for each model.</div></div><div><h3>Results</h3><div>Across 3 random seeds and all training dataset sizes, RETFound outperformed both ResNet-50 and ViT-L on all evaluation metrics on the final held-out test dataset. Large vision transformer and ResNet-50 performed comparably at the largest training dataset sizes of 915 and 500 OCT B-scans; however, ResNet-50 suffered more pronounced performance degradation at the smallest dataset sizes of 100 and 50 OCT B-scans.</div></div><div><h3>Conclusions</h3><div>Our findings validate the benefits of RETFound's additional retina-specific pretraining. Further research is needed to establish best practices for fine-tuning RETFound to downstream tasks.</div></div><div><h3>Financial Disclosure(s)</h3><div>Proprietary or commercial disclosure may be found in the Footnotes and Disclosures at the end of this article.</div></div>","PeriodicalId":74363,"journal":{"name":"Ophthalmology science","volume":"5 3","pages":"Article 100707"},"PeriodicalIF":3.2000,"publicationDate":"2025-01-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Ophthalmology science","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666914525000053","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"OPHTHALMOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Objective

While the availability of public internet-scale datasets of images and language has catalyzed remarkable progress in machine learning, medical datasets are constrained by regulations protecting patient privacy and the time and cost required for curation and labeling. Self-supervised learning or pretraining has demonstrated great success in learning meaningful representations from large unlabeled datasets to enable efficient learning on downstream tasks. In ophthalmology, the RETFound model, a large vision transformer (ViT-L) model trained by masked autoencoding on 1.6 million color fundus photos and OCT B-scans, is the first model pretrained at such scale for ophthalmology, demonstrating strong performance on downstream tasks from diabetic retinopathy grading to stroke detection. Here, we measure the label efficiency of the RETFound model in learning to identify normal vs. abnormal OCT B-scans obtained as part of a pilot study for primary care-based diabetic retinopathy screening in North Carolina.

Design

The 1150 TopCon Maestro OCT central B-scans (981 normal and 169 abnormal) were randomly split 80/10/10 into training, validation, and test datasets. Model training and hyperparameter tuning were performed on the training set guided by validation set performance. The best performing models were then evaluated on the final test set.

Subjects

Six hundred forty-seven patients with diabetes in the Duke Health System participating in primary care diabetic retinopathy screening contributed 1150 TopCon Maestro OCT central B-scans.

Methods

Three models (ResNet-50, ViT-L, and RETFound) were fine-tuned on the full training dataset of 915 OCT B-scans and on smaller training data subsets of 500, 250, 100, and 50 OCT B-scans, respectively, across 3 random seeds.

Main Outcome Measures

Mean accuracy, area under the receiver operator curve (AUROC), area under the precision recall curve (AUPRC), F1 score, precision, and recall on the final held-out test set were reported for each model.

Results

Across 3 random seeds and all training dataset sizes, RETFound outperformed both ResNet-50 and ViT-L on all evaluation metrics on the final held-out test dataset. Large vision transformer and ResNet-50 performed comparably at the largest training dataset sizes of 915 and 500 OCT B-scans; however, ResNet-50 suffered more pronounced performance degradation at the smallest dataset sizes of 100 and 50 OCT B-scans.

Conclusions

Our findings validate the benefits of RETFound's additional retina-specific pretraining. Further research is needed to establish best practices for fine-tuning RETFound to downstream tasks.

Financial Disclosure(s)

Proprietary or commercial disclosure may be found in the Footnotes and Disclosures at the end of this article.

查看原文本刊更多论文

视网膜基础模型有多基础？RETFound在正常与异常OCT图像二值分类中的标记效率评估

虽然公共互联网规模的图像和语言数据集的可用性促进了机器学习的显着进步，但医疗数据集受到保护患者隐私的法规以及管理和标记所需的时间和成本的限制。自监督学习或预训练在从大型未标记数据集中学习有意义的表示以实现下游任务的有效学习方面取得了巨大成功。在眼科领域，RETFound模型是一个大型视觉变压器（ViT-L）模型，通过对160万张彩色眼底照片和OCT b扫描进行蒙面自动编码训练，是第一个在这种规模上进行眼科预训练的模型，在糖尿病视网膜病变分级和中风检测等下游任务上表现出色。在这里，我们测量了RETFound模型在学习识别正常与异常OCT b扫描的标签效率，这些扫描是北卡罗莱纳州基于初级保健的糖尿病视网膜病变筛查的试点研究的一部分。设计将1150张TopCon Maestro OCT中央b扫描图（981张正常，169张异常）随机分为训练、验证和测试数据集（80/10/10）。在验证集性能的指导下，对训练集进行模型训练和超参数调优。然后在最终测试集上对表现最好的模型进行评估。杜克健康系统647名糖尿病患者参加了初级保健糖尿病视网膜病变筛查，贡献了1150张TopCon Maestro OCT中央b扫描。方法对三个模型（ResNet-50、viti - l和RETFound）分别在915个OCT b扫描的完整训练数据集和500、250、100和50个OCT b扫描的较小训练数据子集上进行3个随机种子的微调。报告了每个模型的平均准确率、接收者操作曲线下面积（AUROC）、精度召回曲线下面积（AUPRC）、F1分数、精度和最终测试集上的召回率。在3个随机种子和所有训练数据集大小中，RETFound在最终测试数据集的所有评估指标上都优于ResNet-50和vitl。Large vision transformer和ResNet-50在915次和500次OCT b扫描的最大训练数据集上表现相当；然而，ResNet-50在100次和50次OCT b扫描的最小数据集大小下遭受了更明显的性能下降。结论我们的研究结果证实了RETFound额外的视网膜特异性预训练的益处。需要进一步的研究来建立将RETFound微调到下游任务的最佳实践。财务披露专有或商业披露可在本文末尾的脚注和披露中找到。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊