Do ChatGPT and Gemini's Recommendations Align With Established Guidelines for Hand and Upper Extremity Surgery?

IF 1.8 Q2 ORTHOPEDICS
HAND Pub Date : 2025-09-18 DOI:10.1177/15589447251371089
Yibin B Zhang, Fielding S Fischer, Matthew V Abola, Daniel A Osei, Scott W Wolfe, Troy B Amen
{"title":"Do ChatGPT and Gemini's Recommendations Align With Established Guidelines for Hand and Upper Extremity Surgery?","authors":"Yibin B Zhang, Fielding S Fischer, Matthew V Abola, Daniel A Osei, Scott W Wolfe, Troy B Amen","doi":"10.1177/15589447251371089","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>The use of large language models (LLMs) such as ChatGPT and Gemini in clinical settings has surged, presenting potential benefits in reducing administrative workload and enhancing patient communication. However, concerns about the clinical accuracy of these tools persist. This study evaluated the concordance of ChatGPT and Gemini's recommendations with American Academy of Orthopedic Surgeons (AAOS) clinical practice guidelines (CPGs) for carpal tunnel syndrome, distal radius fractures, and glenohumeral joint osteoarthritis.</p><p><strong>Methods: </strong>ChatGPT (version 4o) and Gemini (version 1.5 Flash) were queried using structured text-based prompts aligned with AAOS CPGs. The LLMs' outputs were analyzed by blinded reviewers to determine concordance with the guidelines. Concordance rates were compared across models, topics, and guideline strength using descriptive statistics and McNemar's test. The transparency of responses, including source citation, was also assessed.</p><p><strong>Results: </strong>A total of 174 recommendations were generated, with an overall concordance rate of 62.1%. When comparing concordance rates between LLMs, there was no statistically significant difference between ChatGPT and Gemini (66.7% vs 57.5%, <i>P</i> = .131). Concordance varied by topic and guideline strength, with ChatGPT performing best for moderately supported guidelines. Both models demonstrated low citation transparency. Gemini provided sources for 39.1% of recommendations, significantly more than ChatGPT's 3.5% (<i>P</i> < .0001).</p><p><strong>Conclusions: </strong>Despite modest concordance rates, both models exhibited significant limitations, including variability across topics and guideline strengths, as well as insufficient citation transparency. These findings highlight the challenges in integrating LLMs into clinical practice and emphasize the need for further refinement and evaluation before adoption in hand surgery.</p>","PeriodicalId":12902,"journal":{"name":"HAND","volume":" ","pages":"15589447251371089"},"PeriodicalIF":1.8000,"publicationDate":"2025-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12446276/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"HAND","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1177/15589447251371089","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ORTHOPEDICS","Score":null,"Total":0}
引用次数: 0

Abstract

Background: The use of large language models (LLMs) such as ChatGPT and Gemini in clinical settings has surged, presenting potential benefits in reducing administrative workload and enhancing patient communication. However, concerns about the clinical accuracy of these tools persist. This study evaluated the concordance of ChatGPT and Gemini's recommendations with American Academy of Orthopedic Surgeons (AAOS) clinical practice guidelines (CPGs) for carpal tunnel syndrome, distal radius fractures, and glenohumeral joint osteoarthritis.

Methods: ChatGPT (version 4o) and Gemini (version 1.5 Flash) were queried using structured text-based prompts aligned with AAOS CPGs. The LLMs' outputs were analyzed by blinded reviewers to determine concordance with the guidelines. Concordance rates were compared across models, topics, and guideline strength using descriptive statistics and McNemar's test. The transparency of responses, including source citation, was also assessed.

Results: A total of 174 recommendations were generated, with an overall concordance rate of 62.1%. When comparing concordance rates between LLMs, there was no statistically significant difference between ChatGPT and Gemini (66.7% vs 57.5%, P = .131). Concordance varied by topic and guideline strength, with ChatGPT performing best for moderately supported guidelines. Both models demonstrated low citation transparency. Gemini provided sources for 39.1% of recommendations, significantly more than ChatGPT's 3.5% (P < .0001).

Conclusions: Despite modest concordance rates, both models exhibited significant limitations, including variability across topics and guideline strengths, as well as insufficient citation transparency. These findings highlight the challenges in integrating LLMs into clinical practice and emphasize the need for further refinement and evaluation before adoption in hand surgery.

ChatGPT和Gemini的建议是否符合手部和上肢手术的既定指南?
背景:大型语言模型(llm)如ChatGPT和Gemini在临床环境中的使用激增,在减少管理工作量和加强患者沟通方面表现出潜在的好处。然而,对这些工具的临床准确性的担忧仍然存在。本研究评估了ChatGPT和Gemini的建议与美国骨科学会(AAOS)临床实践指南(CPGs)对腕管综合征、桡骨远端骨折和盂肱关节骨性关节炎的一致性。方法:ChatGPT(版本40)和Gemini(版本1.5 Flash)使用与AAOS CPGs对齐的结构化文本提示进行查询。法学硕士的成果由盲法审稿人进行分析,以确定与指南的一致性。采用描述性统计和McNemar检验比较不同模型、主题和指南强度的一致性率。回应的透明度,包括来源引用,也被评估。结果:共生成174条推荐,总体符合率为62.1%。当比较llm之间的一致性率时,ChatGPT与Gemini之间无统计学差异(66.7% vs 57.5%, P = 0.131)。一致性因主题和指南强度而异,ChatGPT在中等支持的指南中表现最佳。两种模型的引用透明度都很低。Gemini为39.1%的推荐提供了来源,显著高于ChatGPT的3.5% (P < 0.0001)。结论:尽管有适度的一致性率,但两种模型都表现出显著的局限性,包括不同主题和指南强度的差异,以及引文透明度不足。这些发现强调了将llm整合到临床实践中的挑战,并强调了在手外科应用前需要进一步改进和评估。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
HAND
HAND Medicine-Surgery
CiteScore
3.30
自引率
0.00%
发文量
209
期刊介绍: HAND is the official journal of the American Association for Hand Surgery and is a peer-reviewed journal featuring articles written by clinicians worldwide presenting current research and clinical work in the field of hand surgery. It features articles related to all aspects of hand and upper extremity surgery and the post operative care and rehabilitation of the hand.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信