通过OCT进行视网膜疾病诊断的多模态llm：少次学习与单次学习。

IF 2.3 Q2 OPHTHALMOLOGY

Therapeutic Advances in Ophthalmology Pub Date : 2025-05-20 eCollection Date: 2025-01-01 DOI:10.1177/25158414251340569

Reem Agbareia, Mahmud Omar, Ofira Zloto, Benjamin S Glicksberg, Girish N Nadkarni, Eyal Klang

{"title":"通过OCT进行视网膜疾病诊断的多模态llm：少次学习与单次学习。","authors":"Reem Agbareia, Mahmud Omar, Ofira Zloto, Benjamin S Glicksberg, Girish N Nadkarni, Eyal Klang","doi":"10.1177/25158414251340569","DOIUrl":null,"url":null,"abstract":"Background and aim: Multimodal large language models (LLMs) have shown potential in processing both text and image data for clinical applications. This study evaluated their diagnostic performance in identifying retinal diseases from optical coherence tomography (OCT) images.Methods: We assessed the diagnostic accuracy of GPT-4o and Claude Sonnet 3.5 using two public OCT datasets (OCTID, OCTDL) containing expert-labeled images of four pathological conditions and normal retinas. Both models were tested using single-shot and few-shot prompts, with an overall of 3088 models' API calls. Statistical analyses were performed to evaluate differences in overall and condition-specific performance.Results: GPT-4o's accuracy improved from 56.29% with single-shot prompts to 73.08% with few-shot prompts (p < 0.001). Similarly, Claude Sonnet 3.5 increased from 40.03% to 70.98% using the same approach (p < 0.001). Condition-specific analyses revealed similar trends, with absolute improvements ranging from 2% to 64%. These findings were consistent across the validation dataset.Conclusion: Few-shot prompted multimodal LLMs show promise for clinical integration, particularly in identifying normal retinas, which could help streamline referral processes in primary care. While these models fall short of the diagnostic accuracy reported in established deep learning literature, they offer simple, effective tools for assisting in routine retinal disease diagnosis. Future research should focus on further validation and integrating clinical text data with imaging.","PeriodicalId":23054,"journal":{"name":"Therapeutic Advances in Ophthalmology","volume":"17 ","pages":"25158414251340569"},"PeriodicalIF":2.3000,"publicationDate":"2025-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12093016/pdf/","citationCount":"0","resultStr":"{\"title\":\"Multimodal LLMs for retinal disease diagnosis via OCT: few-shot versus single-shot learning.\",\"authors\":\"Reem Agbareia, Mahmud Omar, Ofira Zloto, Benjamin S Glicksberg, Girish N Nadkarni, Eyal Klang\",\"doi\":\"10.1177/25158414251340569\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Background and aim: Multimodal large language models (LLMs) have shown potential in processing both text and image data for clinical applications. This study evaluated their diagnostic performance in identifying retinal diseases from optical coherence tomography (OCT) images.Methods: We assessed the diagnostic accuracy of GPT-4o and Claude Sonnet 3.5 using two public OCT datasets (OCTID, OCTDL) containing expert-labeled images of four pathological conditions and normal retinas. Both models were tested using single-shot and few-shot prompts, with an overall of 3088 models' API calls. Statistical analyses were performed to evaluate differences in overall and condition-specific performance.Results: GPT-4o's accuracy improved from 56.29% with single-shot prompts to 73.08% with few-shot prompts (p < 0.001). Similarly, Claude Sonnet 3.5 increased from 40.03% to 70.98% using the same approach (p < 0.001). Condition-specific analyses revealed similar trends, with absolute improvements ranging from 2% to 64%. These findings were consistent across the validation dataset.Conclusion: Few-shot prompted multimodal LLMs show promise for clinical integration, particularly in identifying normal retinas, which could help streamline referral processes in primary care. While these models fall short of the diagnostic accuracy reported in established deep learning literature, they offer simple, effective tools for assisting in routine retinal disease diagnosis. Future research should focus on further validation and integrating clinical text data with imaging.\",\"PeriodicalId\":23054,\"journal\":{\"name\":\"Therapeutic Advances in Ophthalmology\",\"volume\":\"17 \",\"pages\":\"25158414251340569\"},\"PeriodicalIF\":2.3000,\"publicationDate\":\"2025-05-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12093016/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Therapeutic Advances in Ophthalmology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1177/25158414251340569\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/1/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"Q2\",\"JCRName\":\"OPHTHALMOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Therapeutic Advances in Ophthalmology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1177/25158414251340569","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"OPHTHALMOLOGY","Score":null,"Total":0}

引用次数: 0

摘要

背景和目的：多模态大语言模型（LLMs）在临床应用中显示出处理文本和图像数据的潜力。本研究评估了它们在光学相干断层扫描（OCT）图像中识别视网膜疾病的诊断性能。方法：我们使用两个公共OCT数据集（OCTID， OCTDL）评估gpt - 40和Claude Sonnet 3.5的诊断准确性，这些数据集包含专家标记的四种病理状态和正常视网膜的图像。这两个模型都使用单次和几次提示进行测试，总共有3088个模型的API调用。进行统计分析以评估总体和特定条件下性能的差异。结果：gpt - 40的准确率从单次提示的56.29%提高到少次提示的73.08% (p)结论：少次提示的多模式LLMs在临床整合方面有希望，特别是在识别正常视网膜方面，可以帮助简化初级保健的转诊流程。虽然这些模型的诊断准确性低于已建立的深度学习文献，但它们为辅助常规视网膜疾病诊断提供了简单有效的工具。未来的研究应集中在进一步验证和整合临床文本数据与影像学。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

Multimodal LLMs for retinal disease diagnosis via OCT: few-shot versus single-shot learning.

查看原文本刊更多论文

Multimodal LLMs for retinal disease diagnosis via OCT: few-shot versus single-shot learning.

Background and aim: Multimodal large language models (LLMs) have shown potential in processing both text and image data for clinical applications. This study evaluated their diagnostic performance in identifying retinal diseases from optical coherence tomography (OCT) images.

Methods: We assessed the diagnostic accuracy of GPT-4o and Claude Sonnet 3.5 using two public OCT datasets (OCTID, OCTDL) containing expert-labeled images of four pathological conditions and normal retinas. Both models were tested using single-shot and few-shot prompts, with an overall of 3088 models' API calls. Statistical analyses were performed to evaluate differences in overall and condition-specific performance.

Results: GPT-4o's accuracy improved from 56.29% with single-shot prompts to 73.08% with few-shot prompts (p < 0.001). Similarly, Claude Sonnet 3.5 increased from 40.03% to 70.98% using the same approach (p < 0.001). Condition-specific analyses revealed similar trends, with absolute improvements ranging from 2% to 64%. These findings were consistent across the validation dataset.

Conclusion: Few-shot prompted multimodal LLMs show promise for clinical integration, particularly in identifying normal retinas, which could help streamline referral processes in primary care. While these models fall short of the diagnostic accuracy reported in established deep learning literature, they offer simple, effective tools for assisting in routine retinal disease diagnosis. Future research should focus on further validation and integrating clinical text data with imaging.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Therapeutic Advances in Ophthalmology OPHTHALMOLOGY-

CiteScore

4.50

自引率

0.00%

发文量

审稿时长

12 weeks