ChatGPT, Bard, and Bing Chat are large language processing models that answered OITE questions with a similar accuracy to first-year orthopaedic surgery residents.
Gage A Guerra, Hayden L Hofmann, Jonathan L Le, Alexander M Wong, Amir Fathi, Cory K Mayfield, Frank A Petrigliano, Joseph N Liu
{"title":"ChatGPT, Bard, and Bing Chat are large language processing models that answered OITE questions with a similar accuracy to first-year orthopaedic surgery residents.","authors":"Gage A Guerra, Hayden L Hofmann, Jonathan L Le, Alexander M Wong, Amir Fathi, Cory K Mayfield, Frank A Petrigliano, Joseph N Liu","doi":"10.1016/j.arthro.2024.08.023","DOIUrl":null,"url":null,"abstract":"<p><strong>Purpose: </strong>To assess ChatGPT, Bard, and BingChat's ability to generate accurate orthopaedic diagnosis or corresponding treatments by comparing their performance on the Orthopaedic In-Training Examination (OITE) to orthopaedic trainees.</p><p><strong>Methods: </strong>OITE question sets from 2021 and 2022 were compiled to form a large set of 420 questions. ChatGPT (GPT3.5), Bard, and BingChat were instructed to select one of the provided responses to each question. The accuracy of composite questions was recorded and comparatively analyzed to human cohorts including medical students and orthopaedic residents, stratified by post-graduate year.</p><p><strong>Results: </strong>ChatGPT correctly answered 46.3% of composite questions whereas BingChat correctly answered 52.4% and Bard correctly answered 51.4% of questions on the OITE. Upon excluding image-associated questions, ChatGPT, BingChat, and Bard's overall accuracies improved to 49.1%, 53.5%, and 56.8%, respectively. Medical students and orthopaedic residents (PGY1-5) correctly answered 30.8%, 53.1%, 60.4%, 66.6%, 70.0%, and 71.9%, respectively.</p><p><strong>Conclusion: </strong>ChatGPT, Bard, and BingChat are AI models that answered OITE questions with an accuracy similar to that of first-year orthopaedic surgery residents. ChatGPT, Bard, and BingChat achieved this result without using images or other supplementary media that human test takers are provided.</p>","PeriodicalId":55459,"journal":{"name":"Arthroscopy-The Journal of Arthroscopic and Related Surgery","volume":null,"pages":null},"PeriodicalIF":4.4000,"publicationDate":"2024-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Arthroscopy-The Journal of Arthroscopic and Related Surgery","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1016/j.arthro.2024.08.023","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ORTHOPEDICS","Score":null,"Total":0}
引用次数: 0
Abstract
Purpose: To assess ChatGPT, Bard, and BingChat's ability to generate accurate orthopaedic diagnosis or corresponding treatments by comparing their performance on the Orthopaedic In-Training Examination (OITE) to orthopaedic trainees.
Methods: OITE question sets from 2021 and 2022 were compiled to form a large set of 420 questions. ChatGPT (GPT3.5), Bard, and BingChat were instructed to select one of the provided responses to each question. The accuracy of composite questions was recorded and comparatively analyzed to human cohorts including medical students and orthopaedic residents, stratified by post-graduate year.
Results: ChatGPT correctly answered 46.3% of composite questions whereas BingChat correctly answered 52.4% and Bard correctly answered 51.4% of questions on the OITE. Upon excluding image-associated questions, ChatGPT, BingChat, and Bard's overall accuracies improved to 49.1%, 53.5%, and 56.8%, respectively. Medical students and orthopaedic residents (PGY1-5) correctly answered 30.8%, 53.1%, 60.4%, 66.6%, 70.0%, and 71.9%, respectively.
Conclusion: ChatGPT, Bard, and BingChat are AI models that answered OITE questions with an accuracy similar to that of first-year orthopaedic surgery residents. ChatGPT, Bard, and BingChat achieved this result without using images or other supplementary media that human test takers are provided.
期刊介绍:
Nowhere is minimally invasive surgery explained better than in Arthroscopy, the leading peer-reviewed journal in the field. Every issue enables you to put into perspective the usefulness of the various emerging arthroscopic techniques. The advantages and disadvantages of these methods -- along with their applications in various situations -- are discussed in relation to their efficiency, efficacy and cost benefit. As a special incentive, paid subscribers also receive access to the journal expanded website.