VisionUnite: a Vision-Language Foundation Model for Ophthalmology Enhanced with Clinical Knowledge.

IF 18.6
Zihan Li, Diping Song, Zefeng Yang, Deming Wang, Fei Li, Xiulan Zhang, Paul E Kinahan, Yu Qiao
{"title":"VisionUnite: a Vision-Language Foundation Model for Ophthalmology Enhanced with Clinical Knowledge.","authors":"Zihan Li, Diping Song, Zefeng Yang, Deming Wang, Fei Li, Xiulan Zhang, Paul E Kinahan, Yu Qiao","doi":"10.1109/TPAMI.2025.3598734","DOIUrl":null,"url":null,"abstract":"<p><p>The need for improved diagnostic methods in ophthalmology is acute, especially in the underdeveloped regions with limited access to specialists and advanced equipment. Therefore, we introduce VisionUnite, a novel vision-language foundation model for ophthalmology enhanced with clinical knowledge. VisionUnite has been pretrained on an extensive dataset comprising 1.24 million image-text pairs, and further refined using our proposed MMFundus dataset, which includes 296,379 high-quality fundus image-text pairs and 889,137 simulated doctor-patient dialogue instances. Our experiments indicate that VisionUnite outperforms existing generative foundation models such as GPT4V and Gemini Pro. It also demonstrates diagnostic capabilities comparable to junior ophthalmologists. VisionUnite performs well in various clinical scenarios including open-ended multidisease diagnosis, clinical explanation, and patient interaction, making it a highly versatile tool for initial ophthalmic disease screening. VisionUnite can also serve as an educational aid for junior ophthalmologists, accelerating their acquisition of knowledge regarding both common and underrepresented ophthalmic conditions. VisionUnite represents a significant advancement in ophthalmology, with broad implications for diagnostics, medical education, and understanding of disease mechanisms. The source code is at https://github.com/HUANGLIZI/VisionUnite.</p>","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"PP ","pages":""},"PeriodicalIF":18.6000,"publicationDate":"2025-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on pattern analysis and machine intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/TPAMI.2025.3598734","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

The need for improved diagnostic methods in ophthalmology is acute, especially in the underdeveloped regions with limited access to specialists and advanced equipment. Therefore, we introduce VisionUnite, a novel vision-language foundation model for ophthalmology enhanced with clinical knowledge. VisionUnite has been pretrained on an extensive dataset comprising 1.24 million image-text pairs, and further refined using our proposed MMFundus dataset, which includes 296,379 high-quality fundus image-text pairs and 889,137 simulated doctor-patient dialogue instances. Our experiments indicate that VisionUnite outperforms existing generative foundation models such as GPT4V and Gemini Pro. It also demonstrates diagnostic capabilities comparable to junior ophthalmologists. VisionUnite performs well in various clinical scenarios including open-ended multidisease diagnosis, clinical explanation, and patient interaction, making it a highly versatile tool for initial ophthalmic disease screening. VisionUnite can also serve as an educational aid for junior ophthalmologists, accelerating their acquisition of knowledge regarding both common and underrepresented ophthalmic conditions. VisionUnite represents a significant advancement in ophthalmology, with broad implications for diagnostics, medical education, and understanding of disease mechanisms. The source code is at https://github.com/HUANGLIZI/VisionUnite.

VisionUnite:临床知识增强的眼科视觉语言基础模型。
改进眼科诊断方法的需求是迫切的,特别是在缺乏专家和先进设备的不发达地区。因此,我们推出了一种新的视觉语言基础模型VisionUnite,这是一种通过临床知识增强的眼科视觉语言基础模型。VisionUnite在包含124万图像文本对的庞大数据集上进行了预训练,并使用我们提出的MMFundus数据集进行了进一步优化,该数据集包括296,379对高质量的眼底图像文本和889,137个模拟医患对话实例。我们的实验表明,VisionUnite优于现有的生成基础模型,如GPT4V和Gemini Pro。它还展示了与初级眼科医生相当的诊断能力。VisionUnite在开放式多病诊断、临床解释和患者互动等多种临床场景中表现良好,是一种高度通用的眼科疾病初步筛查工具。VisionUnite还可以作为初级眼科医生的教育辅助工具,加速他们获得有关常见和未被充分代表的眼科疾病的知识。VisionUnite代表了眼科的重大进步,对诊断、医学教育和疾病机制的理解具有广泛的意义。源代码在https://github.com/HUANGLIZI/VisionUnite。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信