ChatGPT Earns American Board Certification in Hand Surgery

IF 0.9 4区医学 Q4 ORTHOPEDICS

Hand Surgery & Rehabilitation Pub Date : 2024-06-01 DOI:10.1016/j.hansur.2024.101688

Diane Ghanem , Joseph E. Nassar , Joseph El Bachour , Tammam Hanna

{"title":"ChatGPT Earns American Board Certification in Hand Surgery","authors":"Diane Ghanem , Joseph E. Nassar , Joseph El Bachour , Tammam Hanna","doi":"10.1016/j.hansur.2024.101688","DOIUrl":null,"url":null,"abstract":"<div><h3>Purpose</h3><p>Artificial Intelligence (AI), and specifically ChatGPT, has shown potential in healthcare, yet its performance in specialized medical examinations such as the Orthopaedic Surgery In-Training Examination and European Board Hand Surgery diploma has been inconsistent. This study aims to evaluate the capability of ChatGPT-4 to pass the American Hand Surgery Certifying Examination.</p></div><div><h3>Methods</h3><p>ChatGPT-4 was tested on the 2019 American Society for Surgery of the Hand (ASSH) Self-Assessment Exam. All 200 questions available online (<span>https://onlinecme.assh.org</span><svg><path></path></svg>) were retrieved. All media-containing questions were flagged and carefully reviewed. Eight media-containing questions were excluded as they either relied purely on videos or could not be rationalized from the presented information. Descriptive statistics were used to summarize the performance (% correct) of ChatGPT-4. The ASSH report was used to compare ChatGPT-4’s performance to that of the 322 physicians who completed the 2019 ASSH self-assessment.</p></div><div><h3>Results</h3><p>ChatGPT-4 answered 192 questions with an overall score of 61.98%. Performance on media-containing questions was 55.56%, while on non-media questions it was 65.83%, with no statistical difference in performance based on media inclusion. Despite scoring below the average physician’s performance, ChatGPT-4 outperformed in the ‘vascular’ section with 81.82%. Its performance was lower in the ‘bone and joint’ (48.54%) and ‘neuromuscular’ (56.25%) sections.</p></div><div><h3>Conclusions</h3><p>ChatGPT-4 achieved a good overall score of 61.98%. This AI language model demonstrates significant capability in processing and answering specialized medical examination questions, albeit with room for improvement in areas requiring complex clinical judgment and nuanced interpretation. ChatGPT-4’s proficiency is influenced by the structure and language of the examination, with no replacement for the depth of trained medical specialists. This study underscores the supportive role of AI in medical education and clinical decision-making while highlighting the current limitations in nuanced fields such as hand surgery.</p></div>","PeriodicalId":54301,"journal":{"name":"Hand Surgery & Rehabilitation","volume":"43 3","pages":"Article 101688"},"PeriodicalIF":0.9000,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Hand Surgery & Rehabilitation","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2468122924000653","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"ORTHOPEDICS","Score":null,"Total":0}

引用次数: 0

Abstract

Purpose

Artificial Intelligence (AI), and specifically ChatGPT, has shown potential in healthcare, yet its performance in specialized medical examinations such as the Orthopaedic Surgery In-Training Examination and European Board Hand Surgery diploma has been inconsistent. This study aims to evaluate the capability of ChatGPT-4 to pass the American Hand Surgery Certifying Examination.

Methods

ChatGPT-4 was tested on the 2019 American Society for Surgery of the Hand (ASSH) Self-Assessment Exam. All 200 questions available online (https://onlinecme.assh.org) were retrieved. All media-containing questions were flagged and carefully reviewed. Eight media-containing questions were excluded as they either relied purely on videos or could not be rationalized from the presented information. Descriptive statistics were used to summarize the performance (% correct) of ChatGPT-4. The ASSH report was used to compare ChatGPT-4’s performance to that of the 322 physicians who completed the 2019 ASSH self-assessment.

Results

ChatGPT-4 answered 192 questions with an overall score of 61.98%. Performance on media-containing questions was 55.56%, while on non-media questions it was 65.83%, with no statistical difference in performance based on media inclusion. Despite scoring below the average physician’s performance, ChatGPT-4 outperformed in the ‘vascular’ section with 81.82%. Its performance was lower in the ‘bone and joint’ (48.54%) and ‘neuromuscular’ (56.25%) sections.

Conclusions

ChatGPT-4 achieved a good overall score of 61.98%. This AI language model demonstrates significant capability in processing and answering specialized medical examination questions, albeit with room for improvement in areas requiring complex clinical judgment and nuanced interpretation. ChatGPT-4’s proficiency is influenced by the structure and language of the examination, with no replacement for the depth of trained medical specialists. This study underscores the supportive role of AI in medical education and clinical decision-making while highlighting the current limitations in nuanced fields such as hand surgery.

查看原文本刊更多论文

ChatGPT 获得美国手外科委员会认证。

目的：人工智能（AI），特别是 ChatGPT，已在医疗保健领域显示出潜力，但其在专业医学考试（如矫形外科在岗培训考试和欧洲手外科委员会文凭考试）中的表现并不一致。本研究旨在评估 ChatGPT-4 通过美国手外科认证考试的能力：ChatGPT-4 在 2019 年美国手外科学会（ASSH）自我评估考试中进行了测试。检索了在线提供的所有 200 道试题 (https://onlinecme.assh.org)。所有含有媒体的问题都被标记出来并进行了仔细审查。有八道含有媒体的试题被排除在外，因为这些试题要么纯粹依赖于视频，要么无法从提供的信息中得到合理解释。描述性统计用于总结 ChatGPT-4 的表现（正确率）。ASSH 报告用于将 ChatGPT-4 的表现与完成 2019 年 ASSH 自我评估的 322 名医生的表现进行比较：ChatGPT-4 回答了 192 个问题，总得分为 61.98%。在包含媒体的问题上的得分率为 55.56%，而在非媒体问题上的得分率为 65.83%，在包含媒体的问题上的得分率没有统计差异。尽管 ChatGPT-4 的得分低于医生的平均水平，但它在 "血管 "部分的得分率高达 81.82%。其在 "骨与关节"（48.54%）和 "神经肌肉"（56.25%）部分的表现较差：ChatGPT-4 的总得分高达 61.98%。该人工智能语言模型在处理和回答专业医学检查问题方面表现出了显著的能力，尽管在需要复杂临床判断和细微解释的领域还有待提高。ChatGPT-4 的熟练程度受考试结构和语言的影响，无法取代训练有素的医学专家的深度。这项研究强调了人工智能在医学教育和临床决策中的辅助作用，同时也突出了目前在手外科等细微领域的局限性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Hand Surgery & Rehabilitation Medicine-Surgery

CiteScore

1.70

自引率

27.30%

发文量

审稿时长

49 days

期刊介绍： As the official publication of the French, Belgian and Swiss Societies for Surgery of the Hand, as well as of the French Society of Rehabilitation of the Hand & Upper Limb, ''Hand Surgery and Rehabilitation'' - formerly named "Chirurgie de la Main" - publishes original articles, literature reviews, technical notes, and clinical cases. It is indexed in the main international databases (including Medline). Initially a platform for French-speaking hand surgeons, the journal will now publish its articles in English to disseminate its author''s scientific findings more widely. The journal also includes a biannual supplement in French, the monograph of the French Society for Surgery of the Hand, where comprehensive reviews in the fields of hand, peripheral nerve and upper limb surgery are presented. Organe officiel de la Société française de chirurgie de la main, de la Société française de Rééducation de la main (SFRM-GEMMSOR), de la Société suisse de chirurgie de la main et du Belgian Hand Group, indexée dans les grandes bases de données internationales (Medline, Embase, Pascal, Scopus), Hand Surgery and Rehabilitation - anciennement titrée Chirurgie de la main - publie des articles originaux, des revues de la littérature, des notes techniques, des cas clinique. Initialement plateforme d''expression francophone de la spécialité, la revue s''oriente désormais vers l''anglais pour devenir une référence scientifique et de formation de la spécialité en France et en Europe. Avec 6 publications en anglais par an, la revue comprend également un supplément biannuel, la monographie du GEM, où sont présentées en français, des mises au point complètes dans les domaines de la chirurgie de la main, des nerfs périphériques et du membre supérieur.