ChatGPT, Bard, and Bing Chat are large language processing models that answered OITE questions with a similar accuracy to first-year orthopaedic surgery residents.

IF 4.4 1区 医学 Q1 ORTHOPEDICS
Gage A Guerra, Hayden L Hofmann, Jonathan L Le, Alexander M Wong, Amir Fathi, Cory K Mayfield, Frank A Petrigliano, Joseph N Liu
{"title":"ChatGPT, Bard, and Bing Chat are large language processing models that answered OITE questions with a similar accuracy to first-year orthopaedic surgery residents.","authors":"Gage A Guerra, Hayden L Hofmann, Jonathan L Le, Alexander M Wong, Amir Fathi, Cory K Mayfield, Frank A Petrigliano, Joseph N Liu","doi":"10.1016/j.arthro.2024.08.023","DOIUrl":null,"url":null,"abstract":"<p><strong>Purpose: </strong>To assess ChatGPT, Bard, and BingChat's ability to generate accurate orthopaedic diagnosis or corresponding treatments by comparing their performance on the Orthopaedic In-Training Examination (OITE) to orthopaedic trainees.</p><p><strong>Methods: </strong>OITE question sets from 2021 and 2022 were compiled to form a large set of 420 questions. ChatGPT (GPT3.5), Bard, and BingChat were instructed to select one of the provided responses to each question. The accuracy of composite questions was recorded and comparatively analyzed to human cohorts including medical students and orthopaedic residents, stratified by post-graduate year.</p><p><strong>Results: </strong>ChatGPT correctly answered 46.3% of composite questions whereas BingChat correctly answered 52.4% and Bard correctly answered 51.4% of questions on the OITE. Upon excluding image-associated questions, ChatGPT, BingChat, and Bard's overall accuracies improved to 49.1%, 53.5%, and 56.8%, respectively. Medical students and orthopaedic residents (PGY1-5) correctly answered 30.8%, 53.1%, 60.4%, 66.6%, 70.0%, and 71.9%, respectively.</p><p><strong>Conclusion: </strong>ChatGPT, Bard, and BingChat are AI models that answered OITE questions with an accuracy similar to that of first-year orthopaedic surgery residents. ChatGPT, Bard, and BingChat achieved this result without using images or other supplementary media that human test takers are provided.</p>","PeriodicalId":55459,"journal":{"name":"Arthroscopy-The Journal of Arthroscopic and Related Surgery","volume":null,"pages":null},"PeriodicalIF":4.4000,"publicationDate":"2024-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Arthroscopy-The Journal of Arthroscopic and Related Surgery","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1016/j.arthro.2024.08.023","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ORTHOPEDICS","Score":null,"Total":0}
引用次数: 0

Abstract

Purpose: To assess ChatGPT, Bard, and BingChat's ability to generate accurate orthopaedic diagnosis or corresponding treatments by comparing their performance on the Orthopaedic In-Training Examination (OITE) to orthopaedic trainees.

Methods: OITE question sets from 2021 and 2022 were compiled to form a large set of 420 questions. ChatGPT (GPT3.5), Bard, and BingChat were instructed to select one of the provided responses to each question. The accuracy of composite questions was recorded and comparatively analyzed to human cohorts including medical students and orthopaedic residents, stratified by post-graduate year.

Results: ChatGPT correctly answered 46.3% of composite questions whereas BingChat correctly answered 52.4% and Bard correctly answered 51.4% of questions on the OITE. Upon excluding image-associated questions, ChatGPT, BingChat, and Bard's overall accuracies improved to 49.1%, 53.5%, and 56.8%, respectively. Medical students and orthopaedic residents (PGY1-5) correctly answered 30.8%, 53.1%, 60.4%, 66.6%, 70.0%, and 71.9%, respectively.

Conclusion: ChatGPT, Bard, and BingChat are AI models that answered OITE questions with an accuracy similar to that of first-year orthopaedic surgery residents. ChatGPT, Bard, and BingChat achieved this result without using images or other supplementary media that human test takers are provided.

ChatGPT、Bard 和 Bing Chat 是大型语言处理模型,它们回答 OITE 问题的准确率与骨科外科一年级住院医师相近。
目的:通过比较 ChatGPT、Bard 和 BingChat 在骨科在训考试(OITE)中的表现,评估他们与骨科受训人员生成准确骨科诊断或相应治疗的能力:方法:将 2021 年和 2022 年的 OITE 题集汇编成一个包含 420 道题的大题集。在 ChatGPT (GPT3.5)、Bard 和 BingChat 的指导下,从提供的每个问题的答案中选择一个。综合问题的准确性被记录下来,并与人类群组(包括医学生和骨科住院医师)进行比较分析,按研究生年级进行分层:结果:ChatGPT 正确回答了 46.3% 的综合问题,而 BingChat 正确回答了 52.4% 的问题,Bard 正确回答了 51.4% 的 OITE 问题。排除图像相关问题后,ChatGPT、BingChat 和 Bard 的总体准确率分别提高到 49.1%、53.5% 和 56.8%。医学生和骨科住院医师(PGY1-5)的正确回答率分别为 30.8%、53.1%、60.4%、66.6%、70.0% 和 71.9%:结论:ChatGPT、Bard 和 BingChat 是人工智能模型,它们回答 OITE 问题的准确率与骨科外科一年级住院医师相似。ChatGPT、Bard 和 BingChat 在不使用人类应试者提供的图像或其他辅助媒体的情况下取得了这一结果。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
9.30
自引率
17.00%
发文量
555
审稿时长
58 days
期刊介绍: Nowhere is minimally invasive surgery explained better than in Arthroscopy, the leading peer-reviewed journal in the field. Every issue enables you to put into perspective the usefulness of the various emerging arthroscopic techniques. The advantages and disadvantages of these methods -- along with their applications in various situations -- are discussed in relation to their efficiency, efficacy and cost benefit. As a special incentive, paid subscribers also receive access to the journal expanded website.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信