Can a Natural Image-Based Foundation Model Outperform a Retina-Specific Model in Detecting Ocular and Systemic Diseases?

IF 4.6 Q1 OPHTHALMOLOGY

Ophthalmology science Pub Date : 2025-08-27 DOI:10.1016/j.xops.2025.100923

Qingshan Hou PhD , Yukun Zhou PhD , Jocelyn Hui Lin Goh BEng , Ke Zou PhD , Samantha Min Er Yew BSc , Sahana Srinivasan BEng , Meng Wang PhD , Thaddaeus Wai Soon Lo BEng , Xiaofeng Lei MSc , Siegfried K. Wagner MD, PhD , Mark A. Chia MD, PhD , Gabriel Dawei Yang MD, PhD , Hongyang Jiang PhD , An Ran Ran MD, PhD , Rui Santos PhD , Gabor Mark Somfai MD , Juan Helen Zhou PhD , Haoyu Chen MD , Qingyu Chen PhD , Carol Y. Cheung PhD , Yih Chung Tham PhD

{"title":"Can a Natural Image-Based Foundation Model Outperform a Retina-Specific Model in Detecting Ocular and Systemic Diseases?","authors":"Qingshan Hou PhD , Yukun Zhou PhD , Jocelyn Hui Lin Goh BEng , Ke Zou PhD , Samantha Min Er Yew BSc , Sahana Srinivasan BEng , Meng Wang PhD , Thaddaeus Wai Soon Lo BEng , Xiaofeng Lei MSc , Siegfried K. Wagner MD, PhD , Mark A. Chia MD, PhD , Gabriel Dawei Yang MD, PhD , Hongyang Jiang PhD , An Ran Ran MD, PhD , Rui Santos PhD , Gabor Mark Somfai MD , Juan Helen Zhou PhD , Haoyu Chen MD , Qingyu Chen PhD , Carol Y. Cheung PhD , Yih Chung Tham PhD","doi":"10.1016/j.xops.2025.100923","DOIUrl":null,"url":null,"abstract":"<div><h3>Purpose</h3><div>DINOv2 is a natural image-based foundation model (FM), pretrained exclusively on 142 million natural images from the LVD-142M data set. In contrast, RETFound is a retina-specific FM, pretrained on ∼3 million images, including natural images, color fundus photos, and OCT images (∼1 million each). Despite DINOv2's massive pretraining data set, its application in ophthalmology and relative performance to domain-specific FMs remain understudied. To address this gap, we conducted a head-to-head comparative evaluation between DINOv2 and RETFound models across a range of downstream ocular and systemic disease tasks.</div></div><div><h3>Design</h3><div>Retrospective head-to-head evaluation.</div></div><div><h3>Subjects</h3><div>Ocular disease detection tasks included diabetic retinopathy (DR), glaucoma, and multiclass eye diseases, whereas systemic disease incidence prediction focused on the 3-year incidence of heart failure, myocardial infarction, and ischemic stroke. Eight open-source data sets (APTOS-2019, IDRID, MESSIDOR2 for DR; PAPILA, Glaucoma Fundus for glaucoma; JSIEC, Retina, OCTID for multiclass eye diseases) and the Moorfields AlzEye data set (for systemic diseases) were used for fine-tuning and internal testing. External test sets included the same open-source data sets (cross-dataset validation) and the UK Biobank (for systemic diseases).</div></div><div><h3>Methods</h3><div>We replicated the fine-tuning methodology from the original RETFound study on 3 DINOv2 models (large, base, small). All models were fine-tuned on the respective data sets and evaluated through internal and external testing.</div></div><div><h3>Main Outcome Measures</h3><div>Area under the receiver operating characteristics curve and 2-sided t-tests were used to compare models' performances.</div></div><div><h3>Results</h3><div>For ocular disease detection, DINOv2 models generally outperformed RETFound. For DR, DINOv2-Large achieved AUCs of 0.850 to 0.952, exceeding RETFound's 0.823 to 0.944 (all <em>P</em> ≤ 0.007). For multiclass eye diseases, DINOv2-large (AUC = 0.892, Retina data set) surpassed RETFound (AUC = 0.846, <em>P</em> < 0.001). For glaucoma, DINOv2-base (AUC = 0.958, Glaucoma Fundus) outperformed RETFound (AUC = 0.940, <em>P</em> < 0.001). Conversely, for systemic disease incidence prediction, RETFound achieved superior AUCs of 0.796 (heart failure), 0.732 (myocardial infarction), and 0.754 (ischemic stroke), outperforming DINOv2's best models' AUC (0.663–0.771, all <em>P</em> < 0.001). This trend persisted in external validation.</div></div><div><h3>Conclusions</h3><div>Our findings reveal the merits of DINOv2 in ocular disease detection tasks, whereas RETFound demonstrates an edge in systemic disease incidence prediction. These findings showcase the distinct scenarios where general-purpose and domain-specific FMs excel, highlighting the importance of aligning FM selection with task-specific requirements to optimize clinical performance.</div></div><div><h3>Financial Disclosure(s)</h3><div>Proprietary or commercial disclosure may be found in the Footnotes and Disclosures at the end of this article.</div></div>","PeriodicalId":74363,"journal":{"name":"Ophthalmology science","volume":"6 1","pages":"Article 100923"},"PeriodicalIF":4.6000,"publicationDate":"2025-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Ophthalmology science","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666914525002210","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"OPHTHALMOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Purpose

DINOv2 is a natural image-based foundation model (FM), pretrained exclusively on 142 million natural images from the LVD-142M data set. In contrast, RETFound is a retina-specific FM, pretrained on ∼3 million images, including natural images, color fundus photos, and OCT images (∼1 million each). Despite DINOv2's massive pretraining data set, its application in ophthalmology and relative performance to domain-specific FMs remain understudied. To address this gap, we conducted a head-to-head comparative evaluation between DINOv2 and RETFound models across a range of downstream ocular and systemic disease tasks.

Design

Retrospective head-to-head evaluation.

Subjects

Ocular disease detection tasks included diabetic retinopathy (DR), glaucoma, and multiclass eye diseases, whereas systemic disease incidence prediction focused on the 3-year incidence of heart failure, myocardial infarction, and ischemic stroke. Eight open-source data sets (APTOS-2019, IDRID, MESSIDOR2 for DR; PAPILA, Glaucoma Fundus for glaucoma; JSIEC, Retina, OCTID for multiclass eye diseases) and the Moorfields AlzEye data set (for systemic diseases) were used for fine-tuning and internal testing. External test sets included the same open-source data sets (cross-dataset validation) and the UK Biobank (for systemic diseases).

Methods

We replicated the fine-tuning methodology from the original RETFound study on 3 DINOv2 models (large, base, small). All models were fine-tuned on the respective data sets and evaluated through internal and external testing.

Main Outcome Measures

Area under the receiver operating characteristics curve and 2-sided t-tests were used to compare models' performances.

Results

For ocular disease detection, DINOv2 models generally outperformed RETFound. For DR, DINOv2-Large achieved AUCs of 0.850 to 0.952, exceeding RETFound's 0.823 to 0.944 (all P ≤ 0.007). For multiclass eye diseases, DINOv2-large (AUC = 0.892, Retina data set) surpassed RETFound (AUC = 0.846, P < 0.001). For glaucoma, DINOv2-base (AUC = 0.958, Glaucoma Fundus) outperformed RETFound (AUC = 0.940, P < 0.001). Conversely, for systemic disease incidence prediction, RETFound achieved superior AUCs of 0.796 (heart failure), 0.732 (myocardial infarction), and 0.754 (ischemic stroke), outperforming DINOv2's best models' AUC (0.663–0.771, all P < 0.001). This trend persisted in external validation.

Conclusions

Our findings reveal the merits of DINOv2 in ocular disease detection tasks, whereas RETFound demonstrates an edge in systemic disease incidence prediction. These findings showcase the distinct scenarios where general-purpose and domain-specific FMs excel, highlighting the importance of aligning FM selection with task-specific requirements to optimize clinical performance.

Financial Disclosure(s)

Proprietary or commercial disclosure may be found in the Footnotes and Disclosures at the end of this article.

查看原文本刊更多论文

基于自然图像的基础模型在检测眼部和全身性疾病方面是否优于视网膜特异性模型？

dinov2是一个基于自然图像的基础模型（FM），专门对来自LVD-142M数据集的1.42亿张自然图像进行预训练。相比之下，RETFound是一种视网膜特异性FM，对约300万张图像进行了预训练，包括自然图像、彩色眼底照片和OCT图像（各约100万张）。尽管DINOv2有大量的预训练数据集，但其在眼科中的应用以及在特定领域FMs中的相对性能仍有待研究。为了解决这一差距，我们在一系列下游眼部和全身性疾病任务中对DINOv2和RETFound模型进行了正面比较评估。设计回顾性正面评估。受试者眼部疾病检测任务包括糖尿病视网膜病变（DR）、青光眼和多类型眼病，而全身性疾病发病率预测主要集中在心力衰竭、心肌梗死和缺血性脑卒中的3年发病率。使用8个开源数据集（DR的APTOS-2019、IDRID、MESSIDOR2；青光眼的PAPILA、Glaucoma Fundus；多类型眼病的JSIEC、Retina、OCTID）和Moorfields AlzEye数据集（全体性疾病）进行微调和内测。外部测试集包括相同的开源数据集（跨数据集验证）和UK Biobank（用于全身性疾病）。方法我们在3个DINOv2模型（大、基、小）上复制了retfind原始研究的微调方法。所有模型都在各自的数据集上进行了微调，并通过内部和外部测试进行了评估。主要结果测量采用受试者工作特征曲线下面积和双侧t检验比较模型的性能。结果对于眼部疾病的检测，DINOv2模型总体优于RETFound模型。对于DR， DINOv2-Large的auc为0.850 ~ 0.952，超过RETFound的0.823 ~ 0.944 （P均≤0.007）。对于多类别眼病，DINOv2-large （AUC = 0.892，视网膜数据集）超过RETFound （AUC = 0.846, P < 0.001）。对于青光眼，DINOv2-base （AUC = 0.958, glaucoma Fundus）优于RETFound （AUC = 0.940, P < 0.001）。相反，对于全体性疾病发病率预测，RETFound的AUC为0.796（心力衰竭）、0.732（心肌梗死）和0.754（缺血性卒中），优于DINOv2最佳模型的AUC（0.663-0.771，均P <； 0.001）。这种趋势在外部验证中持续存在。结论研究结果表明，DINOv2在眼部疾病检测任务中具有优势，而RETFound在全身性疾病发病率预测方面具有优势。这些发现显示了通用和特定领域FM的不同情况，突出了将FM选择与特定任务要求相结合以优化临床表现的重要性。财务披露专有或商业披露可在本文末尾的脚注和披露中找到。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊