Evaluating the Evolution of ChatGPT as an Information Resource in Shoulder and Elbow Surgery.

IF 1.2 4区医学 Q3 ORTHOPEDICS

Orthopedics Pub Date : 2025-03-01 Epub Date: 2025-01-29 DOI:10.3928/01477447-20250123-03

Benjamin Nieves-Lopez, Alexandra R Bechtle, Jennifer Traverse, Christopher Klifto, Bradley S Schoch, Keith T Aziz

{"title":"Evaluating the Evolution of ChatGPT as an Information Resource in Shoulder and Elbow Surgery.","authors":"Benjamin Nieves-Lopez, Alexandra R Bechtle, Jennifer Traverse, Christopher Klifto, Bradley S Schoch, Keith T Aziz","doi":"10.3928/01477447-20250123-03","DOIUrl":null,"url":null,"abstract":"Background: The purpose of this study was to evaluate the performance and evolution of Chat Generative Pre-Trained Transformer (ChatGPT; OpenAI) as a resource for shoulder and elbow surgery information by assessing its accuracy on the American Academy of Orthopaedic Surgeons shoulder-elbow self-assessment questions. We hypothesized that both ChatGPT models would demonstrate proficiency and that there would be significant improvement with progressive iterations.Materials and methods: A total of 200 questions were selected from the 2019 and 2021 American Academy of Orthopaedic Surgeons shoulder-elbow self-assessment questions. ChatGPT 3.5 and 4 were used to evaluate all questions. Questions with non-text data were excluded (114 questions). Remaining questions were input into ChatGPT and categorized as follows: anatomy, arthroplasty, basic science, instability, miscellaneous, nonoperative, and trauma. ChatGPT's performances were quantified and compared across categories with chi-square tests. The continuing medical education credit threshold of 50% was used to determine proficiency. Statistical significance was set at P<.05.Results: ChatGPT 3.5 and 4 answered 52.3% and 73.3% of the questions correctly, respectively (P=.003). ChatGPT 3.5 performed significantly better in the instability category (P=.037). ChatGPT 4's performance did not significantly differ across categories (P=.841). ChatGPT 4 performed significantly better than ChatGPT 3.5 in all categories except instability and miscellaneous.Conclusion: ChatGPT 3.5 and 4 exceeded the proficiency threshold. ChatGPT 4 performed better than ChatGPT 3.5, showing an increased capability to correctly answer shoulder and elbow-focused questions. Further refinement of ChatGPT's training may improve its performance and utility as a resource. Currently, ChatGPT remains unable to answer questions at a high enough accuracy to replace clinical decision-making. [Orthopedics. 2025;48(2):e69-e74.].","PeriodicalId":19631,"journal":{"name":"Orthopedics","volume":" ","pages":"e69-e74"},"PeriodicalIF":1.2000,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Orthopedics","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.3928/01477447-20250123-03","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/29 0:00:00","PubModel":"Epub","JCR":"Q3","JCRName":"ORTHOPEDICS","Score":null,"Total":0}

引用次数: 0

Abstract

Background: The purpose of this study was to evaluate the performance and evolution of Chat Generative Pre-Trained Transformer (ChatGPT; OpenAI) as a resource for shoulder and elbow surgery information by assessing its accuracy on the American Academy of Orthopaedic Surgeons shoulder-elbow self-assessment questions. We hypothesized that both ChatGPT models would demonstrate proficiency and that there would be significant improvement with progressive iterations.

Materials and methods: A total of 200 questions were selected from the 2019 and 2021 American Academy of Orthopaedic Surgeons shoulder-elbow self-assessment questions. ChatGPT 3.5 and 4 were used to evaluate all questions. Questions with non-text data were excluded (114 questions). Remaining questions were input into ChatGPT and categorized as follows: anatomy, arthroplasty, basic science, instability, miscellaneous, nonoperative, and trauma. ChatGPT's performances were quantified and compared across categories with chi-square tests. The continuing medical education credit threshold of 50% was used to determine proficiency. Statistical significance was set at P<.05.

Results: ChatGPT 3.5 and 4 answered 52.3% and 73.3% of the questions correctly, respectively (P=.003). ChatGPT 3.5 performed significantly better in the instability category (P=.037). ChatGPT 4's performance did not significantly differ across categories (P=.841). ChatGPT 4 performed significantly better than ChatGPT 3.5 in all categories except instability and miscellaneous.

Conclusion: ChatGPT 3.5 and 4 exceeded the proficiency threshold. ChatGPT 4 performed better than ChatGPT 3.5, showing an increased capability to correctly answer shoulder and elbow-focused questions. Further refinement of ChatGPT's training may improve its performance and utility as a resource. Currently, ChatGPT remains unable to answer questions at a high enough accuracy to replace clinical decision-making. [Orthopedics. 2025;48(2):e69-e74.].

查看原文本刊更多论文

评估ChatGPT作为肩肘部手术信息资源的演变。

背景：本研究的目的是评估聊天生成预训练变压器(ChatGPT；OpenAI)作为肩部和肘部手术信息的资源，通过评估其在美国骨科医师学会肩部-肘部自我评估问题上的准确性。我们假设两个ChatGPT模型都将证明熟练程度，并且随着逐步迭代会有显著的改进。材料与方法：选取2019年和2021年美国骨科学会肩肘自评题共200题。使用ChatGPT 3.5和4对所有问题进行评估。包含非文本数据的问题被排除（114个问题）。其余问题输入ChatGPT，分类如下：解剖学、关节成形术、基础科学、不稳定性、杂项、非手术和创伤。ChatGPT的性能被量化，并通过卡方检验跨类别进行比较。使用继续医学教育学分阈值50%来确定熟练程度。结果：ChatGPT 3.5和ChatGPT 4的正确率分别为52.3%和73.3% （P= 0.003）。ChatGPT 3.5在不稳定类别中表现明显更好（P= 0.037）。ChatGPT 4的表现在不同类别之间没有显著差异（P=.841）。ChatGPT 4在除不稳定性和杂项外的所有类别中都明显优于ChatGPT 3.5。结论：ChatGPT 3.5和4超过了熟练阈值。ChatGPT 4比ChatGPT 3.5表现得更好，正确回答肩部和肘部问题的能力有所提高。进一步改进ChatGPT的训练可以提高其性能和作为资源的效用。目前，ChatGPT仍然无法以足够高的准确度回答问题以取代临床决策。[矫形手术。202 x; 4 x (x): xx-xx。]。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Orthopedics 医学-整形外科

CiteScore

2.20

自引率

0.00%

发文量

160

审稿时长

3 months

期刊介绍： For over 40 years, Orthopedics, a bimonthly peer-reviewed journal, has been the preferred choice of orthopedic surgeons for clinically relevant information on all aspects of adult and pediatric orthopedic surgery and treatment. Edited by Robert D''Ambrosia, MD, Chairman of the Department of Orthopedics at the University of Colorado, Denver, and former President of the American Academy of Orthopaedic Surgeons, as well as an Editorial Board of over 100 international orthopedists, Orthopedics is the source to turn to for guidance in your practice. The journal offers access to current articles, as well as several years of archived content. Highlights also include Blue Ribbon articles published full text in print and online, as well as Tips & Techniques posted with every issue.