Evaluating Large Language Models for Preoperative Patient Education in Superior Capsular Reconstruction: Comparative Study of Claude, GPT, and Gemini.

Yukang Liu, Hua Li, Jianfeng Ouyang, Zhaowen Xue, Min Wang, Hebei He, Bin Song, Xiaofei Zheng, Wenyi Gan
{"title":"Evaluating Large Language Models for Preoperative Patient Education in Superior Capsular Reconstruction: Comparative Study of Claude, GPT, and Gemini.","authors":"Yukang Liu, Hua Li, Jianfeng Ouyang, Zhaowen Xue, Min Wang, Hebei He, Bin Song, Xiaofei Zheng, Wenyi Gan","doi":"10.2196/70047","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Large language models (LLMs) are revolutionizing natural language processing, increasingly applied in clinical settings to enhance preoperative patient education.</p><p><strong>Objective: </strong>This study aimed to evaluate the effectiveness and applicability of various LLMs in preoperative patient education by analyzing their responses to superior capsular reconstruction (SCR)-related inquiries.</p><p><strong>Methods: </strong>In total, 10 sports medicine clinical experts formulated 11 SCR issues and developed preoperative patient education strategies during a webinar, inputting 12 text commands into Claude-3-Opus (Anthropic), GPT-4-Turbo (OpenAI), and Gemini-1.5-Pro (Google DeepMind). A total of 3 experts assessed the language models' responses for correctness, completeness, logic, potential harm, and overall satisfaction, while preoperative education documents were evaluated using DISCERN questionnaire and Patient Education Materials Assessment Tool instruments, and reviewed by 5 postoperative patients for readability and educational value; readability of all responses was also analyzed using the cntext package and py-readability-metrics.</p><p><strong>Results: </strong>Between July 1 and August 17, 2024, sports medicine experts and patients evaluated 33 responses and 3 preoperative patient education documents generated by 3 language models regarding SCR surgery. For the 11 query responses, clinicians rated Gemini significantly higher than Claude in all categories (P<.05) and higher than GPT in completeness, risk avoidance, and overall rating (P<.05). For the 3 educational documents, Gemini's Patient Education Materials Assessment Tool score significantly exceeded Claude's (P=.03), and patients rated Gemini's materials superior in all aspects, with significant differences in educational quality versus Claude (P=.02) and overall satisfaction versus both Claude (P<.01) and GPT (P=.01). GPT had significantly higher readability than Claude on 3 R-based metrics (P<.01). Interrater agreement was high among clinicians and fair among patients.</p><p><strong>Conclusions: </strong>Claude-3-Opus, GPT-4-Turbo, and Gemini-1.5-Pro effectively generated readable presurgical education materials but lacked citations and failed to discuss alternative treatments or the risks of forgoing SCR surgery, highlighting the need for expert oversight when using these LLMs in patient education.</p>","PeriodicalId":73557,"journal":{"name":"JMIR perioperative medicine","volume":"8 ","pages":"e70047"},"PeriodicalIF":0.0000,"publicationDate":"2025-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12178570/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"JMIR perioperative medicine","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2196/70047","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Background: Large language models (LLMs) are revolutionizing natural language processing, increasingly applied in clinical settings to enhance preoperative patient education.

Objective: This study aimed to evaluate the effectiveness and applicability of various LLMs in preoperative patient education by analyzing their responses to superior capsular reconstruction (SCR)-related inquiries.

Methods: In total, 10 sports medicine clinical experts formulated 11 SCR issues and developed preoperative patient education strategies during a webinar, inputting 12 text commands into Claude-3-Opus (Anthropic), GPT-4-Turbo (OpenAI), and Gemini-1.5-Pro (Google DeepMind). A total of 3 experts assessed the language models' responses for correctness, completeness, logic, potential harm, and overall satisfaction, while preoperative education documents were evaluated using DISCERN questionnaire and Patient Education Materials Assessment Tool instruments, and reviewed by 5 postoperative patients for readability and educational value; readability of all responses was also analyzed using the cntext package and py-readability-metrics.

Results: Between July 1 and August 17, 2024, sports medicine experts and patients evaluated 33 responses and 3 preoperative patient education documents generated by 3 language models regarding SCR surgery. For the 11 query responses, clinicians rated Gemini significantly higher than Claude in all categories (P<.05) and higher than GPT in completeness, risk avoidance, and overall rating (P<.05). For the 3 educational documents, Gemini's Patient Education Materials Assessment Tool score significantly exceeded Claude's (P=.03), and patients rated Gemini's materials superior in all aspects, with significant differences in educational quality versus Claude (P=.02) and overall satisfaction versus both Claude (P<.01) and GPT (P=.01). GPT had significantly higher readability than Claude on 3 R-based metrics (P<.01). Interrater agreement was high among clinicians and fair among patients.

Conclusions: Claude-3-Opus, GPT-4-Turbo, and Gemini-1.5-Pro effectively generated readable presurgical education materials but lacked citations and failed to discuss alternative treatments or the risks of forgoing SCR surgery, highlighting the need for expert oversight when using these LLMs in patient education.

大语言模型对上囊重建术前患者教育的评价:Claude、GPT和Gemini的比较研究。
背景:大型语言模型(llm)正在彻底改变自然语言处理,越来越多地应用于临床环境,以加强术前患者教育。目的:本研究旨在通过分析不同llm对上囊重建术(SCR)相关询问的反应,评价其在术前患者教育中的有效性和适用性。方法:共10名运动医学临床专家在网络研讨会上制定了11个SCR问题并制定了术前患者教育策略,并在Claude-3-Opus (Anthropic), GPT-4-Turbo (OpenAI)和Gemini-1.5-Pro(谷歌DeepMind)中输入了12个文本命令。共有3位专家对语言模型的正确性、完整性、逻辑性、潜在危害和总体满意度进行评估,术前教育文件使用DISCERN问卷和患者教育材料评估工具进行评估,并由5名术后患者对可读性和教育价值进行评价;还使用上下文包和py-可读性指标分析了所有回复的可读性。结果:在2024年7月1日至8月17日期间,运动医学专家和患者对3种语言模型生成的33份关于SCR手术的回复和3份术前患者教育文件进行了评估。在11个查询回复中,临床医生对Gemini的评分明显高于Claude的所有类别(pconclusion: Claude-3- opus, GPT-4-Turbo和Gemini-1.5- pro有效地生成了可读的手术前教育材料,但缺乏引用,未能讨论替代治疗或放弃SCR手术的风险,突出了在患者教育中使用这些法学硕士时需要专家监督。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
0.50
自引率
0.00%
发文量
0
审稿时长
12 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信