Clark J. Chen, Vivek K. Bilolikar, Duncan VanNest, James Raphael, Gene Shaffer
{"title":"Artificial intelligence in orthopaedic education: A comparative analysis of ChatGPT and Bing AI's Orthopaedic In-Training Examination performance","authors":"Clark J. Chen, Vivek K. Bilolikar, Duncan VanNest, James Raphael, Gene Shaffer","doi":"10.1002/med4.77","DOIUrl":null,"url":null,"abstract":"<div>\n \n \n <section>\n \n <h3> Background</h3>\n \n <p>This study evaluated the performance of generative artificial intelligence (AI) models on the Orthopaedic In-Training Examination (OITE), an annual exam administered to U.S. orthopaedic residency programs.</p>\n </section>\n \n <section>\n \n <h3> Methods</h3>\n \n <p>ChatGPT 3.5 and Bing AI GPT 4.0 were evaluated on standardised sets of multiple-choice questions drawn from the American Academy of Orthopaedic Surgeons OITE online question bank spanning 5 years (2018–2022). A total of 1165 questions were posed to each AI system. The performance of both systems was standardised using the latest versions of ChatGPT 3.5 and Bing AI GPT 4.0. Historical data of resident scores taken from the annual OITE technical reports was used as a comparison.</p>\n </section>\n \n <section>\n \n <h3> Results</h3>\n \n <p>Across the five datasets, ChatGPT 3.5 scored an average of 55.0% on the OITE questions. Bing AI GPT 4.0 scored higher with an average of 80.0%. In comparison, the average performance of orthopaedic residents in national accredited programs was 62.1%. Bing AI GPT 4.0 outperformed ChatGPT 3.5 and Accreditation Council for Graduate Medical Education examinees, and analysis of variance analysis demonstrated <i>p</i> < 0.001 among groups. The best performance was by Bing AI GPT 4.0 on OITE 2020.</p>\n </section>\n \n <section>\n \n <h3> Conclusion</h3>\n \n <p>Generative AI can provide a logical context across answer responses through its in-depth information searches and citation of resources. This combination presents a convincing argument for the possible uses of AI in medical education as an interactive learning aid.</p>\n </section>\n </div>","PeriodicalId":100913,"journal":{"name":"Medicine Advances","volume":"2 3","pages":"284-290"},"PeriodicalIF":0.0000,"publicationDate":"2024-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/med4.77","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Medicine Advances","FirstCategoryId":"1085","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/med4.77","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Background
This study evaluated the performance of generative artificial intelligence (AI) models on the Orthopaedic In-Training Examination (OITE), an annual exam administered to U.S. orthopaedic residency programs.
Methods
ChatGPT 3.5 and Bing AI GPT 4.0 were evaluated on standardised sets of multiple-choice questions drawn from the American Academy of Orthopaedic Surgeons OITE online question bank spanning 5 years (2018–2022). A total of 1165 questions were posed to each AI system. The performance of both systems was standardised using the latest versions of ChatGPT 3.5 and Bing AI GPT 4.0. Historical data of resident scores taken from the annual OITE technical reports was used as a comparison.
Results
Across the five datasets, ChatGPT 3.5 scored an average of 55.0% on the OITE questions. Bing AI GPT 4.0 scored higher with an average of 80.0%. In comparison, the average performance of orthopaedic residents in national accredited programs was 62.1%. Bing AI GPT 4.0 outperformed ChatGPT 3.5 and Accreditation Council for Graduate Medical Education examinees, and analysis of variance analysis demonstrated p < 0.001 among groups. The best performance was by Bing AI GPT 4.0 on OITE 2020.
Conclusion
Generative AI can provide a logical context across answer responses through its in-depth information searches and citation of resources. This combination presents a convincing argument for the possible uses of AI in medical education as an interactive learning aid.