评估大语言模型对眼底疾病知识的熟练程度。

IF 1.8 4区 医学 Q2 OPHTHALMOLOGY
International journal of ophthalmology Pub Date : 2025-07-18 eCollection Date: 2025-01-01 DOI:10.18240/ijo.2025.07.03
Jun-Yi Wu, Yan-Mei Zeng, Xian-Zhe Qian, Qi Hong, Jin-Yu Hu, Hong Wei, Jie Zou, Cheng Chen, Xiao-Yu Wang, Xu Chen, Yi Shao
{"title":"评估大语言模型对眼底疾病知识的熟练程度。","authors":"Jun-Yi Wu, Yan-Mei Zeng, Xian-Zhe Qian, Qi Hong, Jin-Yu Hu, Hong Wei, Jie Zou, Cheng Chen, Xiao-Yu Wang, Xu Chen, Yi Shao","doi":"10.18240/ijo.2025.07.03","DOIUrl":null,"url":null,"abstract":"<p><strong>Aim: </strong>To assess the performance of five distinct large language models (LLMs; ChatGPT-3.5, ChatGPT-4, PaLM2, Claude 2, and SenseNova) in comparison to two human cohorts (a group of funduscopic disease experts and a group of ophthalmologists) on the specialized subject of funduscopic disease.</p><p><strong>Methods: </strong>Five distinct LLMs and two distinct human groups independently completed a 100-item funduscopic disease test. The performance of these entities was assessed by comparing their average scores, response stability, and answer confidence, thereby establishing a basis for evaluation.</p><p><strong>Results: </strong>Among all the LLMs, ChatGPT-4 and PaLM2 exhibited the most substantial average correlation. Additionally, ChatGPT-4 achieved the highest average score and demonstrated the utmost confidence during the exam. In comparison to human cohorts, ChatGPT-4 exhibited comparable performance to ophthalmologists, albeit falling short of the expertise demonstrated by funduscopic disease specialists.</p><p><strong>Conclusion: </strong>The study provides evidence of the exceptional performance of ChatGPT-4 in the domain of funduscopic disease. With continued enhancements, validated LLMs have the potential to yield unforeseen advantages in enhancing healthcare for both patients and physicians.</p>","PeriodicalId":14312,"journal":{"name":"International journal of ophthalmology","volume":"18 7","pages":"1205-1213"},"PeriodicalIF":1.8000,"publicationDate":"2025-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12207300/pdf/","citationCount":"0","resultStr":"{\"title\":\"Assessing the proficiency of large language models on funduscopic disease knowledge.\",\"authors\":\"Jun-Yi Wu, Yan-Mei Zeng, Xian-Zhe Qian, Qi Hong, Jin-Yu Hu, Hong Wei, Jie Zou, Cheng Chen, Xiao-Yu Wang, Xu Chen, Yi Shao\",\"doi\":\"10.18240/ijo.2025.07.03\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Aim: </strong>To assess the performance of five distinct large language models (LLMs; ChatGPT-3.5, ChatGPT-4, PaLM2, Claude 2, and SenseNova) in comparison to two human cohorts (a group of funduscopic disease experts and a group of ophthalmologists) on the specialized subject of funduscopic disease.</p><p><strong>Methods: </strong>Five distinct LLMs and two distinct human groups independently completed a 100-item funduscopic disease test. The performance of these entities was assessed by comparing their average scores, response stability, and answer confidence, thereby establishing a basis for evaluation.</p><p><strong>Results: </strong>Among all the LLMs, ChatGPT-4 and PaLM2 exhibited the most substantial average correlation. Additionally, ChatGPT-4 achieved the highest average score and demonstrated the utmost confidence during the exam. In comparison to human cohorts, ChatGPT-4 exhibited comparable performance to ophthalmologists, albeit falling short of the expertise demonstrated by funduscopic disease specialists.</p><p><strong>Conclusion: </strong>The study provides evidence of the exceptional performance of ChatGPT-4 in the domain of funduscopic disease. With continued enhancements, validated LLMs have the potential to yield unforeseen advantages in enhancing healthcare for both patients and physicians.</p>\",\"PeriodicalId\":14312,\"journal\":{\"name\":\"International journal of ophthalmology\",\"volume\":\"18 7\",\"pages\":\"1205-1213\"},\"PeriodicalIF\":1.8000,\"publicationDate\":\"2025-07-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12207300/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International journal of ophthalmology\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.18240/ijo.2025.07.03\",\"RegionNum\":4,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/1/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"Q2\",\"JCRName\":\"OPHTHALMOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International journal of ophthalmology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.18240/ijo.2025.07.03","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"OPHTHALMOLOGY","Score":null,"Total":0}
引用次数: 0

摘要

目的:评估五种不同的大型语言模型(LLMs;ChatGPT-3.5, ChatGPT-4, PaLM2, Claude 2,和SenseNova)与两个人类队列(一组眼底疾病专家和一组眼科医生)在眼底疾病专业课题上的比较。方法:5个不同的llm和2个不同的人类群体独立完成100项眼底疾病测试。通过比较这些实体的平均得分、反应稳定性和回答置信度来评估它们的表现,从而建立评估的基础。结果:在所有LLMs中,ChatGPT-4与PaLM2的平均相关性最显著。此外,ChatGPT-4取得了最高的平均分,并在考试中表现出最大的信心。与人类队列相比,ChatGPT-4表现出与眼科医生相当的性能,尽管缺乏眼底疾病专家所展示的专业知识。结论:该研究为ChatGPT-4在眼底疾病领域的卓越表现提供了证据。随着不断的改进,经过验证的法学硕士有可能在增强患者和医生的医疗保健方面产生不可预见的优势。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Assessing the proficiency of large language models on funduscopic disease knowledge.

Aim: To assess the performance of five distinct large language models (LLMs; ChatGPT-3.5, ChatGPT-4, PaLM2, Claude 2, and SenseNova) in comparison to two human cohorts (a group of funduscopic disease experts and a group of ophthalmologists) on the specialized subject of funduscopic disease.

Methods: Five distinct LLMs and two distinct human groups independently completed a 100-item funduscopic disease test. The performance of these entities was assessed by comparing their average scores, response stability, and answer confidence, thereby establishing a basis for evaluation.

Results: Among all the LLMs, ChatGPT-4 and PaLM2 exhibited the most substantial average correlation. Additionally, ChatGPT-4 achieved the highest average score and demonstrated the utmost confidence during the exam. In comparison to human cohorts, ChatGPT-4 exhibited comparable performance to ophthalmologists, albeit falling short of the expertise demonstrated by funduscopic disease specialists.

Conclusion: The study provides evidence of the exceptional performance of ChatGPT-4 in the domain of funduscopic disease. With continued enhancements, validated LLMs have the potential to yield unforeseen advantages in enhancing healthcare for both patients and physicians.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
2.50
自引率
7.10%
发文量
3141
审稿时长
4-8 weeks
期刊介绍: · International Journal of Ophthalmology-IJO (English edition) is a global ophthalmological scientific publication and a peer-reviewed open access periodical (ISSN 2222-3959 print, ISSN 2227-4898 online). This journal is sponsored by Chinese Medical Association Xi’an Branch and obtains guidance and support from WHO and ICO (International Council of Ophthalmology). It has been indexed in SCIE, PubMed, PubMed-Central, Chemical Abstracts, Scopus, EMBASE , and DOAJ. IJO JCR IF in 2017 is 1.166. IJO was established in 2008, with editorial office in Xi’an, China. It is a monthly publication. General Scientific Advisors include Prof. Hugh Taylor (President of ICO); Prof.Bruce Spivey (Immediate Past President of ICO); Prof.Mark Tso (Ex-Vice President of ICO) and Prof.Daiming Fan (Academician and Vice President, Chinese Academy of Engineering. International Scientific Advisors include Prof. Serge Resnikoff (WHO Senior Speciatist for Prevention of blindness), Prof. Chi-Chao Chan (National Eye Institute, USA) and Prof. Richard L Abbott (Ex-President of AAO/PAAO) et al. Honorary Editors-in-Chief: Prof. Li-Xin Xie(Academician of Chinese Academy of Engineering/Honorary President of Chinese Ophthalmological Society); Prof. Dennis Lam (President of APAO) and Prof. Xiao-Xin Li (Ex-President of Chinese Ophthalmological Society). Chief Editor: Prof. Xiu-Wen Hu (President of IJO Press). Editors-in-Chief: Prof. Yan-Nian Hui (Ex-Director, Eye Institute of Chinese PLA) and Prof. George Chiou (Founding chief editor of Journal of Ocular Pharmacology & Therapeutics). Associate Editors-in-Chief include: Prof. Ning-Li Wang (President Elect of APAO); Prof. Ke Yao (President of Chinese Ophthalmological Society) ; Prof.William Smiddy (Bascom Palmer Eye instituteUSA) ; Prof.Joel Schuman (President of Association of University Professors of Ophthalmology,USA); Prof.Yizhi Liu (Vice President of Chinese Ophtlalmology Society); Prof.Yu-Sheng Wang (Director of Eye Institute of Chinese PLA); Prof.Ling-Yun Cheng (Director of Ocular Pharmacology, Shiley Eye Center, USA). IJO accepts contributions in English from all over the world. It includes mainly original articles and review articles, both basic and clinical papers. Instruction is Welcome Contribution is Welcome Citation is Welcome Cooperation organization International Council of Ophthalmology(ICO), PubMed, PMC, American Academy of Ophthalmology, Asia-Pacific, Thomson Reuters, The Charlesworth Group, Crossref,Scopus,Publons, DOAJ etc.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信