Murat S. Ayhan PhD , Ariel Y. Ong MD , Eden Ruffell , Siegfried K. Wagner MD, PhD , David A. Merle MD , Pearse A. Keane MD
{"title":"In-Context Learning for Data-Efficient Diabetic Retinopathy Detection via Multimodal Foundation Models","authors":"Murat S. Ayhan PhD , Ariel Y. Ong MD , Eden Ruffell , Siegfried K. Wagner MD, PhD , David A. Merle MD , Pearse A. Keane MD","doi":"10.1016/j.xops.2025.100934","DOIUrl":null,"url":null,"abstract":"<div><h3>Objective</h3><div>This study aims to evaluate whether in-context learning (ICL), a prompt-based learning mechanism enabling multimodal foundation models to rapidly adapt to new tasks without retraining or large annotated datasets, can achieve comparable diagnostic performance to domain-specific foundation models. Specifically, we use diabetic retinopathy (DR) detection as an exemplar task to probe if a multimodal foundation model (Google Gemini 1.5 Pro), employing ICL, can match the performance of a domain-specific model (RETFound) fine-tuned explicitly for DR detection from color fundus photographs (CFPs).</div></div><div><h3>Design</h3><div>A cross-sectional study.</div></div><div><h3>Subjects</h3><div>A retrospective, publicly available dataset (Indian Diabetic Retinopathy Image Dataset) comprising 516 CFPs collected at an eye clinic in India, featuring both healthy individuals and patients with DR.</div></div><div><h3>Methods</h3><div>The images were dichotomized into 2 groups based on the presence or absence of any signs of DR. RETFound was fine-tuned for this binary classification task, while Gemini 1.5 Pro was assessed for it under zero-shot and few-shot prompting scenarios, with the latter incorporating random or k-nearest-neighbors-based sampling of a varying number of example images. For experiments, data were partitioned into training, validation, and test sets in a stratified manner, with the process repeated for 10-fold cross-validation.</div></div><div><h3>Main Outcome Measures</h3><div>Performance was assessed via accuracy, F1 score, and expected calibration error of predictive probabilities. Statistical significance was evaluated using Wilcoxon tests.</div></div><div><h3>Results</h3><div>The best ICL performance with Gemini 1.5 Pro yielded an average accuracy of 0.841 (95% confidence interval [CI]: 0.803–0.879), an F1 score of 0.876 (95% CI: 0.844–0.909), and a calibration error of 0.129 (95% CI: 0.107–0.152). RETFound achieved an average accuracy of 0.849 (95% CI: 0.813–0.885), an F1 score of 0.883 (95% CI: 0.852–0.915), and a calibration error of 0.081 (95% CI: 0.066–0.097). While accuracy and F1 scores were comparable (<em>P</em> > 0.3), RETFound’s calibration was superior (<em>P</em> = 0.004).</div></div><div><h3>Conclusions</h3><div>Gemini 1.5 Pro with ICL demonstrated performance comparable to RETFound for binary DR detection, illustrating how future medical artificial intelligence systems may build upon such frontier models rather than being bespoke solutions.</div></div><div><h3>Financial Disclosure(s)</h3><div>Proprietary or commercial disclosure may be found in the Footnotes and Disclosures at the end of this article.</div></div>","PeriodicalId":74363,"journal":{"name":"Ophthalmology science","volume":"6 1","pages":"Article 100934"},"PeriodicalIF":4.6000,"publicationDate":"2025-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Ophthalmology science","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666914525002325","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"OPHTHALMOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Objective
This study aims to evaluate whether in-context learning (ICL), a prompt-based learning mechanism enabling multimodal foundation models to rapidly adapt to new tasks without retraining or large annotated datasets, can achieve comparable diagnostic performance to domain-specific foundation models. Specifically, we use diabetic retinopathy (DR) detection as an exemplar task to probe if a multimodal foundation model (Google Gemini 1.5 Pro), employing ICL, can match the performance of a domain-specific model (RETFound) fine-tuned explicitly for DR detection from color fundus photographs (CFPs).
Design
A cross-sectional study.
Subjects
A retrospective, publicly available dataset (Indian Diabetic Retinopathy Image Dataset) comprising 516 CFPs collected at an eye clinic in India, featuring both healthy individuals and patients with DR.
Methods
The images were dichotomized into 2 groups based on the presence or absence of any signs of DR. RETFound was fine-tuned for this binary classification task, while Gemini 1.5 Pro was assessed for it under zero-shot and few-shot prompting scenarios, with the latter incorporating random or k-nearest-neighbors-based sampling of a varying number of example images. For experiments, data were partitioned into training, validation, and test sets in a stratified manner, with the process repeated for 10-fold cross-validation.
Main Outcome Measures
Performance was assessed via accuracy, F1 score, and expected calibration error of predictive probabilities. Statistical significance was evaluated using Wilcoxon tests.
Results
The best ICL performance with Gemini 1.5 Pro yielded an average accuracy of 0.841 (95% confidence interval [CI]: 0.803–0.879), an F1 score of 0.876 (95% CI: 0.844–0.909), and a calibration error of 0.129 (95% CI: 0.107–0.152). RETFound achieved an average accuracy of 0.849 (95% CI: 0.813–0.885), an F1 score of 0.883 (95% CI: 0.852–0.915), and a calibration error of 0.081 (95% CI: 0.066–0.097). While accuracy and F1 scores were comparable (P > 0.3), RETFound’s calibration was superior (P = 0.004).
Conclusions
Gemini 1.5 Pro with ICL demonstrated performance comparable to RETFound for binary DR detection, illustrating how future medical artificial intelligence systems may build upon such frontier models rather than being bespoke solutions.
Financial Disclosure(s)
Proprietary or commercial disclosure may be found in the Footnotes and Disclosures at the end of this article.