Performance of Two Artificial Intelligence Generative Language Models on the Orthopaedic In-Training Examination.

IF 1.1 4区医学 Q3 ORTHOPEDICS

Orthopedics Pub Date : 2024-05-01 Epub Date: 2024-03-12 DOI:10.3928/01477447-20240304-02

Marc Lubitz, Luke Latario

{"title":"Performance of Two Artificial Intelligence Generative Language Models on the Orthopaedic In-Training Examination.","authors":"Marc Lubitz, Luke Latario","doi":"10.3928/01477447-20240304-02","DOIUrl":null,"url":null,"abstract":"Background: Artificial intelligence (AI) generative large language models are powerful and increasingly accessible tools with potential applications in health care education and training. The annual Orthopaedic In-Training Examination (OITE) is widely used to assess resident academic progress and preparation for the American Board of Orthopaedic Surgery Part 1 Examination.Materials and methods: Open AI's ChatGPT and Google's Bard generative language models were administered the 2022 OITE. Question stems that contained images were input without and then with a text-based description of the imaging findings.Results: ChatGPT answered 69.1% of questions correctly. When provided with text describing accompanying media, this increased to 77.8% correct. In contrast, Bard answered 49.8% of questions correctly. This increased to 58% correct when text describing imaging in question stems was provided (P<.0001). ChatGPT was most accurate in questions within the shoulder category, with 90.9% correct. Bard performed best in the sports category, with 65.4% correct. ChatGPT performed above the published mean of Accreditation Council for Graduate Medical Education orthopedic resident test-takers (66%).Conclusion: There is significant variability in the accuracy of publicly available AI models on the OITE. AI generative language software may play numerous potential roles in the future in orthopedic education, including simulating patient presentations and clinical scenarios, customizing individual learning plans, and driving evidence-based case discussion. Further research and collaboration within the orthopedic community is required to safely adopt these tools and minimize risks associated with their use. [Orthopedics. 2024;47(3):e146-e150.].","PeriodicalId":19631,"journal":{"name":"Orthopedics","volume":" ","pages":"e146-e150"},"PeriodicalIF":1.1000,"publicationDate":"2024-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Orthopedics","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.3928/01477447-20240304-02","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/3/12 0:00:00","PubModel":"Epub","JCR":"Q3","JCRName":"ORTHOPEDICS","Score":null,"Total":0}

引用次数: 0

Abstract

Background: Artificial intelligence (AI) generative large language models are powerful and increasingly accessible tools with potential applications in health care education and training. The annual Orthopaedic In-Training Examination (OITE) is widely used to assess resident academic progress and preparation for the American Board of Orthopaedic Surgery Part 1 Examination.

Materials and methods: Open AI's ChatGPT and Google's Bard generative language models were administered the 2022 OITE. Question stems that contained images were input without and then with a text-based description of the imaging findings.

Results: ChatGPT answered 69.1% of questions correctly. When provided with text describing accompanying media, this increased to 77.8% correct. In contrast, Bard answered 49.8% of questions correctly. This increased to 58% correct when text describing imaging in question stems was provided (P<.0001). ChatGPT was most accurate in questions within the shoulder category, with 90.9% correct. Bard performed best in the sports category, with 65.4% correct. ChatGPT performed above the published mean of Accreditation Council for Graduate Medical Education orthopedic resident test-takers (66%).

Conclusion: There is significant variability in the accuracy of publicly available AI models on the OITE. AI generative language software may play numerous potential roles in the future in orthopedic education, including simulating patient presentations and clinical scenarios, customizing individual learning plans, and driving evidence-based case discussion. Further research and collaboration within the orthopedic community is required to safely adopt these tools and minimize risks associated with their use. [Orthopedics. 2024;47(3):e146-e150.].

查看原文本刊更多论文

两种人工智能生成语言模型在骨科内训考试中的表现。

背景：人工智能（AI）生成大语言模型是一种功能强大且日益普及的工具，在医疗保健教育和培训中具有潜在的应用价值。一年一度的矫形外科在岗培训考试（OITE）被广泛用于评估住院医师的学习进度和美国矫形外科委员会第一部分考试的准备情况：对 Open AI 的 ChatGPT 和谷歌的 Bard 生成语言模型进行了 2022 年 OITE 测试。在输入包含图像的题干时，先不输入图像，然后再输入基于文本的成像结果描述：结果：ChatGPT 正确回答了 69.1% 的问题。当提供随附媒体的文字描述时，正确率提高到 77.8%。相比之下，Bard 回答的问题正确率为 49.8%。当在问题题干中提供描述成像的文字时，正确率上升到 58%（PC 结论：在 OITE 上公开提供的人工智能模型的准确性存在很大差异。人工智能生成语言软件未来可能会在骨科教育中发挥许多潜在作用，包括模拟患者陈述和临床场景、定制个人学习计划以及推动循证病例讨论。要安全地采用这些工具并最大限度地降低与使用这些工具相关的风险，还需要在骨科界开展进一步的研究与合作。[骨科。202x;4x(x):xx-xx]。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Orthopedics 医学-整形外科

CiteScore

2.20

自引率

0.00%

发文量

160

审稿时长

3 months

期刊介绍： For over 40 years, Orthopedics, a bimonthly peer-reviewed journal, has been the preferred choice of orthopedic surgeons for clinically relevant information on all aspects of adult and pediatric orthopedic surgery and treatment. Edited by Robert D''Ambrosia, MD, Chairman of the Department of Orthopedics at the University of Colorado, Denver, and former President of the American Academy of Orthopaedic Surgeons, as well as an Editorial Board of over 100 international orthopedists, Orthopedics is the source to turn to for guidance in your practice. The journal offers access to current articles, as well as several years of archived content. Highlights also include Blue Ribbon articles published full text in print and online, as well as Tips & Techniques posted with every issue.