Evaluating Large Language Models in Ptosis-Related inquiries: A Cross-Lingual Study.

IF 2.6 3区 医学 Q2 OPHTHALMOLOGY
Ling-Han Niu, Li Wei, Bixuan Qin, Tao Chen, Li Dong, Yueqing He, Xue Jiang, Mingyang Wang, Lan Ma, Jialu Geng, Lechen Wang, Dongmei Li
{"title":"Evaluating Large Language Models in Ptosis-Related inquiries: A Cross-Lingual Study.","authors":"Ling-Han Niu, Li Wei, Bixuan Qin, Tao Chen, Li Dong, Yueqing He, Xue Jiang, Mingyang Wang, Lan Ma, Jialu Geng, Lechen Wang, Dongmei Li","doi":"10.1167/tvst.14.7.9","DOIUrl":null,"url":null,"abstract":"<p><strong>Purpose: </strong>The purpose of this study was to evaluate the performance of large language models (LLMs)-GPT-4, GPT-4o, Qwen2, and Qwen2.5-in addressing patient- and clinician-focused questions on ptosis-related inquiries, emphasizing cross-lingual applicability and patient-centric assessment.</p><p><strong>Methods: </strong>We collected 11 patient-centric and 50 doctor-centric questions covering ptosis symptoms, treatment, and postoperative care. Responses generated by GPT-4, GPT-4o, Qwen2, and Qwen2.5 were evaluated using predefined criteria: accuracy, sufficiency, clarity, and depth (doctor questions); and helpfulness, clarity, and empathy (patient questions). Clinical assessments involved 30 patients with ptosis and 8 oculoplastic surgeons rating responses on a 5-point Likert scale.</p><p><strong>Results: </strong>For doctor questions, GPT-4o outperformed Qwen2.5 in overall performance (53.1% vs. 18.8%, P = 0.035) and completeness (P = 0.049). For patient questions, GPT-4o scored higher in helpfulness (mean rank = 175.28 vs. 155.72, P = 0.035), with no significant differences in clarity or empathy. Qwen2.5 exhibited superior Chinese-language clarity compared to English (P = 0.023).</p><p><strong>Conclusions: </strong>LLMs, particularly GPT-4o, demonstrate robust performance in ptosis-related inquiries, excelling in English and offering clinically valuable insights. Qwen2.5 showed advantages in Chinese clarity. Although promising for patient education and clinician support, these models require rigorous validation, domain-specific training, and cultural adaptation before clinical deployment. Future efforts should focus on refining multilingual capabilities and integrating real-time expert oversight to ensure safety and relevance in diverse healthcare contexts.</p><p><strong>Translational relevance: </strong>This study bridges artificial intelligence (AI) advancements with clinical practice by demonstrating how optimized LLMs can enhance patient education and cross-linguistic clinician support tools in ptosis-related inquiries.</p>","PeriodicalId":23322,"journal":{"name":"Translational Vision Science & Technology","volume":"14 7","pages":"9"},"PeriodicalIF":2.6000,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12279073/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Translational Vision Science & Technology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1167/tvst.14.7.9","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"OPHTHALMOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Purpose: The purpose of this study was to evaluate the performance of large language models (LLMs)-GPT-4, GPT-4o, Qwen2, and Qwen2.5-in addressing patient- and clinician-focused questions on ptosis-related inquiries, emphasizing cross-lingual applicability and patient-centric assessment.

Methods: We collected 11 patient-centric and 50 doctor-centric questions covering ptosis symptoms, treatment, and postoperative care. Responses generated by GPT-4, GPT-4o, Qwen2, and Qwen2.5 were evaluated using predefined criteria: accuracy, sufficiency, clarity, and depth (doctor questions); and helpfulness, clarity, and empathy (patient questions). Clinical assessments involved 30 patients with ptosis and 8 oculoplastic surgeons rating responses on a 5-point Likert scale.

Results: For doctor questions, GPT-4o outperformed Qwen2.5 in overall performance (53.1% vs. 18.8%, P = 0.035) and completeness (P = 0.049). For patient questions, GPT-4o scored higher in helpfulness (mean rank = 175.28 vs. 155.72, P = 0.035), with no significant differences in clarity or empathy. Qwen2.5 exhibited superior Chinese-language clarity compared to English (P = 0.023).

Conclusions: LLMs, particularly GPT-4o, demonstrate robust performance in ptosis-related inquiries, excelling in English and offering clinically valuable insights. Qwen2.5 showed advantages in Chinese clarity. Although promising for patient education and clinician support, these models require rigorous validation, domain-specific training, and cultural adaptation before clinical deployment. Future efforts should focus on refining multilingual capabilities and integrating real-time expert oversight to ensure safety and relevance in diverse healthcare contexts.

Translational relevance: This study bridges artificial intelligence (AI) advancements with clinical practice by demonstrating how optimized LLMs can enhance patient education and cross-linguistic clinician support tools in ptosis-related inquiries.

Abstract Image

Abstract Image

一项跨语言研究:评估下垂相关查询中的大型语言模型。
目的:本研究的目的是评估大型语言模型(LLMs)——gpt -4、gpt - 40、Qwen2和qwen2.5——在解决以患者和临床为中心的下垂相关问题方面的表现,强调跨语言适用性和以患者为中心的评估。方法:我们收集了11个以患者为中心和50个以医生为中心的问题,包括上睑下垂的症状、治疗和术后护理。使用预定义的标准对GPT-4、gpt - 40、Qwen2和Qwen2.5产生的回答进行评估:准确性、充分性、清晰度和深度(医生问题);以及乐于助人、思路清晰和同理心(病人的问题)。临床评估包括30名上睑下垂患者和8名眼部整形外科医生,他们用5分李克特量表对患者的反应进行评分。结果:对于医生问题,gpt - 40在总体表现(53.1% vs. 18.8%, P = 0.035)和完整性(P = 0.049)上均优于Qwen2.5。对于患者问题,gpt - 40在帮助性方面得分更高(平均排名= 175.28比155.72,P = 0.035),在清晰度和同理心方面没有显著差异。Qwen2.5的中文清晰度优于英文(P = 0.023)。结论:法学硕士,尤其是gpt - 40,在下垂相关的调查中表现出色,擅长英语并提供临床有价值的见解。Qwen2.5在中文清晰度上有优势。尽管这些模型对患者教育和临床医生的支持很有希望,但在临床部署之前,这些模型需要严格的验证、特定领域的培训和文化适应。未来的努力应集中在完善多语言功能和整合实时专家监督,以确保在不同的医疗保健环境中的安全性和相关性。翻译相关性:本研究将人工智能(AI)的进步与临床实践联系起来,展示了优化的llm如何加强患者教育和跨语言临床医生支持工具,以进行与光相关的查询。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Translational Vision Science & Technology
Translational Vision Science & Technology Engineering-Biomedical Engineering
CiteScore
5.70
自引率
3.30%
发文量
346
审稿时长
25 weeks
期刊介绍: Translational Vision Science & Technology (TVST), an official journal of the Association for Research in Vision and Ophthalmology (ARVO), an international organization whose purpose is to advance research worldwide into understanding the visual system and preventing, treating and curing its disorders, is an online, open access, peer-reviewed journal emphasizing multidisciplinary research that bridges the gap between basic research and clinical care. A highly qualified and diverse group of Associate Editors and Editorial Board Members is led by Editor-in-Chief Marco Zarbin, MD, PhD, FARVO. The journal covers a broad spectrum of work, including but not limited to: Applications of stem cell technology for regenerative medicine, Development of new animal models of human diseases, Tissue bioengineering, Chemical engineering to improve virus-based gene delivery, Nanotechnology for drug delivery, Design and synthesis of artificial extracellular matrices, Development of a true microsurgical operating environment, Refining data analysis algorithms to improve in vivo imaging technology, Results of Phase 1 clinical trials, Reverse translational ("bedside to bench") research. TVST seeks manuscripts from scientists and clinicians with diverse backgrounds ranging from basic chemistry to ophthalmic surgery that will advance or change the way we understand and/or treat vision-threatening diseases. TVST encourages the use of color, multimedia, hyperlinks, program code and other digital enhancements.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信