基于多模态基础模型的数据高效糖尿病视网膜病变检测的上下文学习

IF 4.6 Q1 OPHTHALMOLOGY

Ophthalmology science Pub Date : 2025-09-03 DOI:10.1016/j.xops.2025.100934

Murat S. Ayhan PhD , Ariel Y. Ong MD , Eden Ruffell , Siegfried K. Wagner MD, PhD , David A. Merle MD , Pearse A. Keane MD

{"title":"基于多模态基础模型的数据高效糖尿病视网膜病变检测的上下文学习","authors":"Murat S. Ayhan PhD , Ariel Y. Ong MD , Eden Ruffell , Siegfried K. Wagner MD, PhD , David A. Merle MD , Pearse A. Keane MD","doi":"10.1016/j.xops.2025.100934","DOIUrl":null,"url":null,"abstract":"<div><h3>Objective</h3><div>This study aims to evaluate whether in-context learning (ICL), a prompt-based learning mechanism enabling multimodal foundation models to rapidly adapt to new tasks without retraining or large annotated datasets, can achieve comparable diagnostic performance to domain-specific foundation models. Specifically, we use diabetic retinopathy (DR) detection as an exemplar task to probe if a multimodal foundation model (Google Gemini 1.5 Pro), employing ICL, can match the performance of a domain-specific model (RETFound) fine-tuned explicitly for DR detection from color fundus photographs (CFPs).</div></div><div><h3>Design</h3><div>A cross-sectional study.</div></div><div><h3>Subjects</h3><div>A retrospective, publicly available dataset (Indian Diabetic Retinopathy Image Dataset) comprising 516 CFPs collected at an eye clinic in India, featuring both healthy individuals and patients with DR.</div></div><div><h3>Methods</h3><div>The images were dichotomized into 2 groups based on the presence or absence of any signs of DR. RETFound was fine-tuned for this binary classification task, while Gemini 1.5 Pro was assessed for it under zero-shot and few-shot prompting scenarios, with the latter incorporating random or k-nearest-neighbors-based sampling of a varying number of example images. For experiments, data were partitioned into training, validation, and test sets in a stratified manner, with the process repeated for 10-fold cross-validation.</div></div><div><h3>Main Outcome Measures</h3><div>Performance was assessed via accuracy, F1 score, and expected calibration error of predictive probabilities. Statistical significance was evaluated using Wilcoxon tests.</div></div><div><h3>Results</h3><div>The best ICL performance with Gemini 1.5 Pro yielded an average accuracy of 0.841 (95% confidence interval [CI]: 0.803–0.879), an F1 score of 0.876 (95% CI: 0.844–0.909), and a calibration error of 0.129 (95% CI: 0.107–0.152). RETFound achieved an average accuracy of 0.849 (95% CI: 0.813–0.885), an F1 score of 0.883 (95% CI: 0.852–0.915), and a calibration error of 0.081 (95% CI: 0.066–0.097). While accuracy and F1 scores were comparable (<em>P</em> > 0.3), RETFound’s calibration was superior (<em>P</em> = 0.004).</div></div><div><h3>Conclusions</h3><div>Gemini 1.5 Pro with ICL demonstrated performance comparable to RETFound for binary DR detection, illustrating how future medical artificial intelligence systems may build upon such frontier models rather than being bespoke solutions.</div></div><div><h3>Financial Disclosure(s)</h3><div>Proprietary or commercial disclosure may be found in the Footnotes and Disclosures at the end of this article.</div></div>","PeriodicalId":74363,"journal":{"name":"Ophthalmology science","volume":"6 1","pages":"Article 100934"},"PeriodicalIF":4.6000,"publicationDate":"2025-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"In-Context Learning for Data-Efficient Diabetic Retinopathy Detection via Multimodal Foundation Models\",\"authors\":\"Murat S. Ayhan PhD , Ariel Y. Ong MD , Eden Ruffell , Siegfried K. Wagner MD, PhD , David A. Merle MD , Pearse A. Keane MD\",\"doi\":\"10.1016/j.xops.2025.100934\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><h3>Objective</h3><div>This study aims to evaluate whether in-context learning (ICL), a prompt-based learning mechanism enabling multimodal foundation models to rapidly adapt to new tasks without retraining or large annotated datasets, can achieve comparable diagnostic performance to domain-specific foundation models. Specifically, we use diabetic retinopathy (DR) detection as an exemplar task to probe if a multimodal foundation model (Google Gemini 1.5 Pro), employing ICL, can match the performance of a domain-specific model (RETFound) fine-tuned explicitly for DR detection from color fundus photographs (CFPs).</div></div><div><h3>Design</h3><div>A cross-sectional study.</div></div><div><h3>Subjects</h3><div>A retrospective, publicly available dataset (Indian Diabetic Retinopathy Image Dataset) comprising 516 CFPs collected at an eye clinic in India, featuring both healthy individuals and patients with DR.</div></div><div><h3>Methods</h3><div>The images were dichotomized into 2 groups based on the presence or absence of any signs of DR. RETFound was fine-tuned for this binary classification task, while Gemini 1.5 Pro was assessed for it under zero-shot and few-shot prompting scenarios, with the latter incorporating random or k-nearest-neighbors-based sampling of a varying number of example images. For experiments, data were partitioned into training, validation, and test sets in a stratified manner, with the process repeated for 10-fold cross-validation.</div></div><div><h3>Main Outcome Measures</h3><div>Performance was assessed via accuracy, F1 score, and expected calibration error of predictive probabilities. Statistical significance was evaluated using Wilcoxon tests.</div></div><div><h3>Results</h3><div>The best ICL performance with Gemini 1.5 Pro yielded an average accuracy of 0.841 (95% confidence interval [CI]: 0.803–0.879), an F1 score of 0.876 (95% CI: 0.844–0.909), and a calibration error of 0.129 (95% CI: 0.107–0.152). RETFound achieved an average accuracy of 0.849 (95% CI: 0.813–0.885), an F1 score of 0.883 (95% CI: 0.852–0.915), and a calibration error of 0.081 (95% CI: 0.066–0.097). While accuracy and F1 scores were comparable (<em>P</em> > 0.3), RETFound’s calibration was superior (<em>P</em> = 0.004).</div></div><div><h3>Conclusions</h3><div>Gemini 1.5 Pro with ICL demonstrated performance comparable to RETFound for binary DR detection, illustrating how future medical artificial intelligence systems may build upon such frontier models rather than being bespoke solutions.</div></div><div><h3>Financial Disclosure(s)</h3><div>Proprietary or commercial disclosure may be found in the Footnotes and Disclosures at the end of this article.</div></div>\",\"PeriodicalId\":74363,\"journal\":{\"name\":\"Ophthalmology science\",\"volume\":\"6 1\",\"pages\":\"Article 100934\"},\"PeriodicalIF\":4.6000,\"publicationDate\":\"2025-09-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Ophthalmology science\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2666914525002325\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"OPHTHALMOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Ophthalmology science","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666914525002325","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"OPHTHALMOLOGY","Score":null,"Total":0}

引用次数: 0

摘要

本研究旨在评估上下文学习（ICL）是否可以达到与特定领域基础模型相当的诊断性能。ICL是一种基于提示的学习机制，使多模态基础模型能够快速适应新任务，而无需再训练或大型注释数据集。具体来说，我们使用糖尿病视网膜病变（DR）检测作为示例任务，以探测采用ICL的多模态基础模型（谷歌Gemini 1.5 Pro）是否可以匹配特定领域模型（RETFound）的性能，该模型明确地对彩色眼底照片（CFPs）进行了微调，用于DR检测。设计横断面研究。研究对象是一个回顾性的、公开的数据集（印度糖尿病视网膜病变图像数据集），包括从印度一家眼科诊所收集的516个cfp，包括健康个体和dr患者。方法根据是否有dr迹象将图像分为两组。RETFound针对这个二元分类任务进行了微调，而Gemini 1.5 Pro则在零注射和少注射的提示场景下进行了评估。后者结合随机或基于k近邻的采样不同数量的示例图像。对于实验，数据以分层的方式分为训练集、验证集和测试集，并重复该过程进行10倍交叉验证。主要结局测量指标通过准确性、F1评分和预测概率的预期校准误差来评估性能。采用Wilcoxon检验评估统计学显著性。结果Gemini 1.5 Pro的ICL性能最佳，平均准确度为0.841(95%可信区间[CI]: 0.803-0.879)， F1评分为0.876 (95% CI: 0.844-0.909)，校准误差为0.129 （95% CI: 0.107-0.152）。RETFound的平均准确度为0.849 (95% CI: 0.813-0.885)， F1评分为0.883 (95% CI: 0.852-0.915)，校准误差为0.081 （95% CI: 0.066-0.097）。虽然准确度和F1评分相当（P > 0.3），但RETFound的校准更优越（P = 0.004）。具有ICL的gemini 1.5 Pro在二元DR检测方面的性能与RETFound相当，说明未来的医疗人工智能系统可能建立在这些前沿模型之上，而不是定制解决方案。财务披露专有或商业披露可在本文末尾的脚注和披露中找到。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

In-Context Learning for Data-Efficient Diabetic Retinopathy Detection via Multimodal Foundation Models

Objective

This study aims to evaluate whether in-context learning (ICL), a prompt-based learning mechanism enabling multimodal foundation models to rapidly adapt to new tasks without retraining or large annotated datasets, can achieve comparable diagnostic performance to domain-specific foundation models. Specifically, we use diabetic retinopathy (DR) detection as an exemplar task to probe if a multimodal foundation model (Google Gemini 1.5 Pro), employing ICL, can match the performance of a domain-specific model (RETFound) fine-tuned explicitly for DR detection from color fundus photographs (CFPs).

Design

A cross-sectional study.

Subjects

A retrospective, publicly available dataset (Indian Diabetic Retinopathy Image Dataset) comprising 516 CFPs collected at an eye clinic in India, featuring both healthy individuals and patients with DR.

Methods

The images were dichotomized into 2 groups based on the presence or absence of any signs of DR. RETFound was fine-tuned for this binary classification task, while Gemini 1.5 Pro was assessed for it under zero-shot and few-shot prompting scenarios, with the latter incorporating random or k-nearest-neighbors-based sampling of a varying number of example images. For experiments, data were partitioned into training, validation, and test sets in a stratified manner, with the process repeated for 10-fold cross-validation.

Main Outcome Measures

Performance was assessed via accuracy, F1 score, and expected calibration error of predictive probabilities. Statistical significance was evaluated using Wilcoxon tests.

Results

The best ICL performance with Gemini 1.5 Pro yielded an average accuracy of 0.841 (95% confidence interval [CI]: 0.803–0.879), an F1 score of 0.876 (95% CI: 0.844–0.909), and a calibration error of 0.129 (95% CI: 0.107–0.152). RETFound achieved an average accuracy of 0.849 (95% CI: 0.813–0.885), an F1 score of 0.883 (95% CI: 0.852–0.915), and a calibration error of 0.081 (95% CI: 0.066–0.097). While accuracy and F1 scores were comparable (P > 0.3), RETFound’s calibration was superior (P = 0.004).

Conclusions

Gemini 1.5 Pro with ICL demonstrated performance comparable to RETFound for binary DR detection, illustrating how future medical artificial intelligence systems may build upon such frontier models rather than being bespoke solutions.