Comparative performance analysis of global and chinese-domain large language models for myopia.

IF 2.8 3区 医学 Q1 OPHTHALMOLOGY
Eye Pub Date : 2025-04-13 DOI:10.1038/s41433-025-03775-5
Zehua Jiang, Yueyuan Xu, Zhi Wei Lim, Ziyao Wang, Yingxiang Han, Samantha Min Er Yew, Zhe Pan, Qian Wang, Gangyue Wu, Tien Yin Wong, Xiaofei Wang, Yaxing Wang, Yih Chung Tham
{"title":"Comparative performance analysis of global and chinese-domain large language models for myopia.","authors":"Zehua Jiang, Yueyuan Xu, Zhi Wei Lim, Ziyao Wang, Yingxiang Han, Samantha Min Er Yew, Zhe Pan, Qian Wang, Gangyue Wu, Tien Yin Wong, Xiaofei Wang, Yaxing Wang, Yih Chung Tham","doi":"10.1038/s41433-025-03775-5","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>The performance of global large language models (LLMs), trained largely on Western data, for disease in other settings and languages is unknown. Taking myopia as an illustration, we evaluated the global versus Chinese-domain LLMs in addressing Chinese-specific myopia-related questions.</p><p><strong>Methods: </strong>Global LLMs (ChatGPT-3.5, ChatGPT-4.0, Google Bard, Llama-2 7B Chat) and Chinese-domain LLMs (Huatuo-GPT, MedGPT, Ali Tongyi Qianwen, and Baidu ERNIE Bot, Baidu ERNIE 4.0) were included. All LLMs were prompted to address 39 Chinese-specific myopia queries across 10 domains. 3 myopia experts evaluated the accuracy of responses with a 3-point scale. \"Good\"-rating responses were further evaluated for comprehensiveness and empathy using a five-point scale. \"Poor\"-rating responses were further prompted for self-correction and re-analysis.</p><p><strong>Results: </strong>The top 3 LLMs in accuracy were ChatGPT-3.5 (8.72 ± 0.75), Baidu ERNIE 4.0 (8.62 ± 0.62), and ChatGPT-4.0 (8.59 ± 0.93), with highest proportions of 94.8% \"Good\" responses. Top five LLMs with comprehensiveness were ChatGPT-3.5 (4.58 ± 0.42), ChatGPT-4.0 (4.56 ± 0.50), Baidu ERNIE 4.0 (4.44 ± 0.49), MedGPT (4.34 ± 0.59), and Baidu ERNIE Bot (4.22 ± 0.74) (all p ≥ 0.059, versus ChatGPT-3.5). While for empathy were ChatGPT-3.5 (4.75 ± 0.25), ChatGPT-4.0 (4.68 ± 0.32), MedGPT (4.50 ± 0.47), Baidu ERNIE Bot (4.42 ± 0.46), and Baidu ERNIE 4.0 (4.34 ± 0.64) (all p ≥ 0.052, versus ChatGPT-3.5). Baidu ERNIE 4.0 did not receive a \"Poor\" rating, while others demonstrated self-correction capabilities, showing enhancements ranging from 50% to 100%.</p><p><strong>Conclusions: </strong>Global and Chinese-domain LLMs demonstrate effective performance in addressing Chinese-specific myopia-related queries. Global LLMs revealed optimal performance in Chinese-language settings despite primarily training with non-Chinese data and in English.</p>","PeriodicalId":12125,"journal":{"name":"Eye","volume":" ","pages":""},"PeriodicalIF":2.8000,"publicationDate":"2025-04-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Eye","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1038/s41433-025-03775-5","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"OPHTHALMOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Background: The performance of global large language models (LLMs), trained largely on Western data, for disease in other settings and languages is unknown. Taking myopia as an illustration, we evaluated the global versus Chinese-domain LLMs in addressing Chinese-specific myopia-related questions.

Methods: Global LLMs (ChatGPT-3.5, ChatGPT-4.0, Google Bard, Llama-2 7B Chat) and Chinese-domain LLMs (Huatuo-GPT, MedGPT, Ali Tongyi Qianwen, and Baidu ERNIE Bot, Baidu ERNIE 4.0) were included. All LLMs were prompted to address 39 Chinese-specific myopia queries across 10 domains. 3 myopia experts evaluated the accuracy of responses with a 3-point scale. "Good"-rating responses were further evaluated for comprehensiveness and empathy using a five-point scale. "Poor"-rating responses were further prompted for self-correction and re-analysis.

Results: The top 3 LLMs in accuracy were ChatGPT-3.5 (8.72 ± 0.75), Baidu ERNIE 4.0 (8.62 ± 0.62), and ChatGPT-4.0 (8.59 ± 0.93), with highest proportions of 94.8% "Good" responses. Top five LLMs with comprehensiveness were ChatGPT-3.5 (4.58 ± 0.42), ChatGPT-4.0 (4.56 ± 0.50), Baidu ERNIE 4.0 (4.44 ± 0.49), MedGPT (4.34 ± 0.59), and Baidu ERNIE Bot (4.22 ± 0.74) (all p ≥ 0.059, versus ChatGPT-3.5). While for empathy were ChatGPT-3.5 (4.75 ± 0.25), ChatGPT-4.0 (4.68 ± 0.32), MedGPT (4.50 ± 0.47), Baidu ERNIE Bot (4.42 ± 0.46), and Baidu ERNIE 4.0 (4.34 ± 0.64) (all p ≥ 0.052, versus ChatGPT-3.5). Baidu ERNIE 4.0 did not receive a "Poor" rating, while others demonstrated self-correction capabilities, showing enhancements ranging from 50% to 100%.

Conclusions: Global and Chinese-domain LLMs demonstrate effective performance in addressing Chinese-specific myopia-related queries. Global LLMs revealed optimal performance in Chinese-language settings despite primarily training with non-Chinese data and in English.

全球和中文域大语言模型对近视的比较性能分析。
背景:主要在西方数据上训练的全球大型语言模型(LLMs)在其他环境和语言下的疾病表现尚不清楚。以近视为例,我们评估了全球法学硕士与中国法学硕士在解决中国特有的近视相关问题方面的对比。方法:纳入全球LLMs (ChatGPT-3.5、ChatGPT-4.0、谷歌Bard、llama - 27b Chat)和中国域LLMs(华图- gpt、MedGPT、阿里同义钱文、百度ERNIE Bot、百度ERNIE 4.0)。所有法学硕士都被要求回答10个领域的39个中国特有的近视问题。3位近视专家用3分制评估了回答的准确性。用五分制进一步评估“好”等级的回答的综合性和同理心。“差”评级的回答进一步提示自我纠正和重新分析。结果:准确度排名前3位的LLMs分别为ChatGPT-3.5(8.72±0.75)、百度ERNIE 4.0(8.62±0.62)和ChatGPT-4.0(8.59±0.93),“良好”应答率最高,为94.8%。综合度排名前5位的LLMs分别为ChatGPT-3.5(4.58±0.42)、ChatGPT-4.0(4.56±0.50)、百度ERNIE 4.0(4.44±0.49)、MedGPT(4.34±0.59)和百度ERNIE Bot(4.22±0.74)(p均≥0.059,与ChatGPT-3.5相比)。而共情测试为ChatGPT-3.5(4.75±0.25)、ChatGPT-4.0(4.68±0.32)、MedGPT(4.50±0.47)、百度ERNIE Bot(4.42±0.46)和百度ERNIE 4.0(4.34±0.64)(p均≥0.052,与ChatGPT-3.5相比)。百度ERNIE 4.0没有被评为“差”,而其他版本则表现出自我纠正能力,显示出50%到100%的增强。结论:全球法学硕士和中国法学硕士在解决中国特有的近视相关查询方面表现出有效的性能。尽管主要使用非中文数据和英语进行培训,但全球法学硕士在中文环境中表现最佳。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Eye
Eye 医学-眼科学
CiteScore
6.40
自引率
5.10%
发文量
481
审稿时长
3-6 weeks
期刊介绍: Eye seeks to provide the international practising ophthalmologist with high quality articles, of academic rigour, on the latest global clinical and laboratory based research. Its core aim is to advance the science and practice of ophthalmology with the latest clinical- and scientific-based research. Whilst principally aimed at the practising clinician, the journal contains material of interest to a wider readership including optometrists, orthoptists, other health care professionals and research workers in all aspects of the field of visual science worldwide. Eye is the official journal of The Royal College of Ophthalmologists. Eye encourages the submission of original articles covering all aspects of ophthalmology including: external eye disease; oculo-plastic surgery; orbital and lacrimal disease; ocular surface and corneal disorders; paediatric ophthalmology and strabismus; glaucoma; medical and surgical retina; neuro-ophthalmology; cataract and refractive surgery; ocular oncology; ophthalmic pathology; ophthalmic genetics.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信