评估在眼表疾病中使用大语言模型的可能性。

IF 1.9 4区 医学 Q2 OPHTHALMOLOGY
International journal of ophthalmology Pub Date : 2025-01-18 eCollection Date: 2025-01-01 DOI:10.18240/ijo.2025.01.01
Qian Ling, Zi-Song Xu, Yan-Mei Zeng, Qi Hong, Xian-Zhe Qian, Jin-Yu Hu, Chong-Gang Pei, Hong Wei, Jie Zou, Cheng Chen, Xiao-Yu Wang, Xu Chen, Zhen-Kai Wu, Yi Shao
{"title":"评估在眼表疾病中使用大语言模型的可能性。","authors":"Qian Ling, Zi-Song Xu, Yan-Mei Zeng, Qi Hong, Xian-Zhe Qian, Jin-Yu Hu, Chong-Gang Pei, Hong Wei, Jie Zou, Cheng Chen, Xiao-Yu Wang, Xu Chen, Zhen-Kai Wu, Yi Shao","doi":"10.18240/ijo.2025.01.01","DOIUrl":null,"url":null,"abstract":"<p><strong>Aim: </strong>To assess the possibility of using different large language models (LLMs) in ocular surface diseases by selecting five different LLMS to test their accuracy in answering specialized questions related to ocular surface diseases: ChatGPT-4, ChatGPT-3.5, Claude 2, PaLM2, and SenseNova.</p><p><strong>Methods: </strong>A group of experienced ophthalmology professors were asked to develop a 100-question single-choice question on ocular surface diseases designed to assess the performance of LLMs and human participants in answering ophthalmology specialty exam questions. The exam includes questions on the following topics: keratitis disease (20 questions), keratoconus, keratomalaciac, corneal dystrophy, corneal degeneration, erosive corneal ulcers, and corneal lesions associated with systemic diseases (20 questions), conjunctivitis disease (20 questions), trachoma, pterygoid and conjunctival tumor diseases (20 questions), and dry eye disease (20 questions). Then the total score of each LLMs and compared their mean score, mean correlation, variance, and confidence were calculated.</p><p><strong>Results: </strong>GPT-4 exhibited the highest performance in terms of LLMs. Comparing the average scores of the LLMs group with the four human groups, chief physician, attending physician, regular trainee, and graduate student, it was found that except for ChatGPT-4, the total score of the rest of the LLMs is lower than that of the graduate student group, which had the lowest score in the human group. Both ChatGPT-4 and PaLM2 were more likely to give exact and correct answers, giving very little chance of an incorrect answer. ChatGPT-4 showed higher credibility when answering questions, with a success rate of 59%, but gave the wrong answer to the question 28% of the time.</p><p><strong>Conclusion: </strong>GPT-4 model exhibits excellent performance in both answer relevance and confidence. PaLM2 shows a positive correlation (up to 0.8) in terms of answer accuracy during the exam. In terms of answer confidence, PaLM2 is second only to GPT4 and surpasses Claude 2, SenseNova, and GPT-3.5. Despite the fact that ocular surface disease is a highly specialized discipline, GPT-4 still exhibits superior performance, suggesting that its potential and ability to be applied in this field is enormous, perhaps with the potential to be a valuable resource for medical students and clinicians in the future.</p>","PeriodicalId":14312,"journal":{"name":"International journal of ophthalmology","volume":"18 1","pages":"1-8"},"PeriodicalIF":1.9000,"publicationDate":"2025-01-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11672086/pdf/","citationCount":"0","resultStr":"{\"title\":\"Assessing the possibility of using large language models in ocular surface diseases.\",\"authors\":\"Qian Ling, Zi-Song Xu, Yan-Mei Zeng, Qi Hong, Xian-Zhe Qian, Jin-Yu Hu, Chong-Gang Pei, Hong Wei, Jie Zou, Cheng Chen, Xiao-Yu Wang, Xu Chen, Zhen-Kai Wu, Yi Shao\",\"doi\":\"10.18240/ijo.2025.01.01\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Aim: </strong>To assess the possibility of using different large language models (LLMs) in ocular surface diseases by selecting five different LLMS to test their accuracy in answering specialized questions related to ocular surface diseases: ChatGPT-4, ChatGPT-3.5, Claude 2, PaLM2, and SenseNova.</p><p><strong>Methods: </strong>A group of experienced ophthalmology professors were asked to develop a 100-question single-choice question on ocular surface diseases designed to assess the performance of LLMs and human participants in answering ophthalmology specialty exam questions. The exam includes questions on the following topics: keratitis disease (20 questions), keratoconus, keratomalaciac, corneal dystrophy, corneal degeneration, erosive corneal ulcers, and corneal lesions associated with systemic diseases (20 questions), conjunctivitis disease (20 questions), trachoma, pterygoid and conjunctival tumor diseases (20 questions), and dry eye disease (20 questions). Then the total score of each LLMs and compared their mean score, mean correlation, variance, and confidence were calculated.</p><p><strong>Results: </strong>GPT-4 exhibited the highest performance in terms of LLMs. Comparing the average scores of the LLMs group with the four human groups, chief physician, attending physician, regular trainee, and graduate student, it was found that except for ChatGPT-4, the total score of the rest of the LLMs is lower than that of the graduate student group, which had the lowest score in the human group. Both ChatGPT-4 and PaLM2 were more likely to give exact and correct answers, giving very little chance of an incorrect answer. ChatGPT-4 showed higher credibility when answering questions, with a success rate of 59%, but gave the wrong answer to the question 28% of the time.</p><p><strong>Conclusion: </strong>GPT-4 model exhibits excellent performance in both answer relevance and confidence. PaLM2 shows a positive correlation (up to 0.8) in terms of answer accuracy during the exam. In terms of answer confidence, PaLM2 is second only to GPT4 and surpasses Claude 2, SenseNova, and GPT-3.5. Despite the fact that ocular surface disease is a highly specialized discipline, GPT-4 still exhibits superior performance, suggesting that its potential and ability to be applied in this field is enormous, perhaps with the potential to be a valuable resource for medical students and clinicians in the future.</p>\",\"PeriodicalId\":14312,\"journal\":{\"name\":\"International journal of ophthalmology\",\"volume\":\"18 1\",\"pages\":\"1-8\"},\"PeriodicalIF\":1.9000,\"publicationDate\":\"2025-01-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11672086/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International journal of ophthalmology\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.18240/ijo.2025.01.01\",\"RegionNum\":4,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/1/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"Q2\",\"JCRName\":\"OPHTHALMOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International journal of ophthalmology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.18240/ijo.2025.01.01","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"OPHTHALMOLOGY","Score":null,"Total":0}
引用次数: 0

摘要

目的:通过选择ChatGPT-4、ChatGPT-3.5、Claude 2、PaLM2和SenseNova五种不同的大语言模型(LLMs)来测试它们在回答与眼表疾病相关的专业问题时的准确性,评估在眼表疾病中使用不同大语言模型(LLMs)的可能性。方法:一组经验丰富的眼科学教授被要求编写一份100道关于眼表疾病的单选题,旨在评估法学硕士和人类参与者在回答眼科学专业考试问题方面的表现。考试题目包括以下主题:角膜炎疾病(20道题)、圆锥角膜、角膜炎、角膜营养不良、角膜变性、糜烂性角膜溃疡和与全身疾病相关的角膜病变(20道题)、结膜炎疾病(20道题)、沙眼、翼状和结膜肿瘤疾病(20道题)和干眼病(20道题)。然后计算各llm的总分,并比较其平均分、平均相关性、方差和置信度。结果:GPT-4在LLMs方面表现出最高的性能。将LLMs组的平均得分与主任医师、主治医师、普通实习生和研究生四组人类的平均得分进行比较,发现除了ChatGPT-4外,其余LLMs的总分都低于研究生组,研究生组在人类组中得分最低。ChatGPT-4和PaLM2都更有可能给出准确和正确的答案,给出错误答案的可能性很小。ChatGPT-4在回答问题时表现出更高的可信度,成功率为59%,但错误答案的比例为28%。结论:GPT-4模型在答案相关性和置信度方面均表现优异。PaLM2在考试期间的答案准确性方面显示出正相关(高达0.8)。在回答可信度方面,PaLM2仅次于GPT4,超过了Claude 2、SenseNova、GPT-3.5。尽管眼表疾病是一门高度专业化的学科,但GPT-4仍然表现出优越的性能,这表明它在这一领域的应用潜力和能力是巨大的,也许未来有可能成为医学生和临床医生的宝贵资源。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Assessing the possibility of using large language models in ocular surface diseases.

Aim: To assess the possibility of using different large language models (LLMs) in ocular surface diseases by selecting five different LLMS to test their accuracy in answering specialized questions related to ocular surface diseases: ChatGPT-4, ChatGPT-3.5, Claude 2, PaLM2, and SenseNova.

Methods: A group of experienced ophthalmology professors were asked to develop a 100-question single-choice question on ocular surface diseases designed to assess the performance of LLMs and human participants in answering ophthalmology specialty exam questions. The exam includes questions on the following topics: keratitis disease (20 questions), keratoconus, keratomalaciac, corneal dystrophy, corneal degeneration, erosive corneal ulcers, and corneal lesions associated with systemic diseases (20 questions), conjunctivitis disease (20 questions), trachoma, pterygoid and conjunctival tumor diseases (20 questions), and dry eye disease (20 questions). Then the total score of each LLMs and compared their mean score, mean correlation, variance, and confidence were calculated.

Results: GPT-4 exhibited the highest performance in terms of LLMs. Comparing the average scores of the LLMs group with the four human groups, chief physician, attending physician, regular trainee, and graduate student, it was found that except for ChatGPT-4, the total score of the rest of the LLMs is lower than that of the graduate student group, which had the lowest score in the human group. Both ChatGPT-4 and PaLM2 were more likely to give exact and correct answers, giving very little chance of an incorrect answer. ChatGPT-4 showed higher credibility when answering questions, with a success rate of 59%, but gave the wrong answer to the question 28% of the time.

Conclusion: GPT-4 model exhibits excellent performance in both answer relevance and confidence. PaLM2 shows a positive correlation (up to 0.8) in terms of answer accuracy during the exam. In terms of answer confidence, PaLM2 is second only to GPT4 and surpasses Claude 2, SenseNova, and GPT-3.5. Despite the fact that ocular surface disease is a highly specialized discipline, GPT-4 still exhibits superior performance, suggesting that its potential and ability to be applied in this field is enormous, perhaps with the potential to be a valuable resource for medical students and clinicians in the future.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
2.50
自引率
7.10%
发文量
3141
审稿时长
4-8 weeks
期刊介绍: · International Journal of Ophthalmology-IJO (English edition) is a global ophthalmological scientific publication and a peer-reviewed open access periodical (ISSN 2222-3959 print, ISSN 2227-4898 online). This journal is sponsored by Chinese Medical Association Xi’an Branch and obtains guidance and support from WHO and ICO (International Council of Ophthalmology). It has been indexed in SCIE, PubMed, PubMed-Central, Chemical Abstracts, Scopus, EMBASE , and DOAJ. IJO JCR IF in 2017 is 1.166. IJO was established in 2008, with editorial office in Xi’an, China. It is a monthly publication. General Scientific Advisors include Prof. Hugh Taylor (President of ICO); Prof.Bruce Spivey (Immediate Past President of ICO); Prof.Mark Tso (Ex-Vice President of ICO) and Prof.Daiming Fan (Academician and Vice President, Chinese Academy of Engineering. International Scientific Advisors include Prof. Serge Resnikoff (WHO Senior Speciatist for Prevention of blindness), Prof. Chi-Chao Chan (National Eye Institute, USA) and Prof. Richard L Abbott (Ex-President of AAO/PAAO) et al. Honorary Editors-in-Chief: Prof. Li-Xin Xie(Academician of Chinese Academy of Engineering/Honorary President of Chinese Ophthalmological Society); Prof. Dennis Lam (President of APAO) and Prof. Xiao-Xin Li (Ex-President of Chinese Ophthalmological Society). Chief Editor: Prof. Xiu-Wen Hu (President of IJO Press). Editors-in-Chief: Prof. Yan-Nian Hui (Ex-Director, Eye Institute of Chinese PLA) and Prof. George Chiou (Founding chief editor of Journal of Ocular Pharmacology & Therapeutics). Associate Editors-in-Chief include: Prof. Ning-Li Wang (President Elect of APAO); Prof. Ke Yao (President of Chinese Ophthalmological Society) ; Prof.William Smiddy (Bascom Palmer Eye instituteUSA) ; Prof.Joel Schuman (President of Association of University Professors of Ophthalmology,USA); Prof.Yizhi Liu (Vice President of Chinese Ophtlalmology Society); Prof.Yu-Sheng Wang (Director of Eye Institute of Chinese PLA); Prof.Ling-Yun Cheng (Director of Ocular Pharmacology, Shiley Eye Center, USA). IJO accepts contributions in English from all over the world. It includes mainly original articles and review articles, both basic and clinical papers. Instruction is Welcome Contribution is Welcome Citation is Welcome Cooperation organization International Council of Ophthalmology(ICO), PubMed, PMC, American Academy of Ophthalmology, Asia-Pacific, Thomson Reuters, The Charlesworth Group, Crossref,Scopus,Publons, DOAJ etc.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信