Performance of ChatGPT on the Plastic Surgery In-Training Examination.

Eplasty Pub Date : 2024-12-18 eCollection Date: 2024-01-01
Brielle E Raine, Katherine A Kozlowski, Cody C Fowler, Jordan D Frey
{"title":"Performance of ChatGPT on the Plastic Surgery In-Training Examination.","authors":"Brielle E Raine, Katherine A Kozlowski, Cody C Fowler, Jordan D Frey","doi":"","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Recently, the artificial intelligence chatbot Chat Generative Pre-Trained Transformer (ChatGPT) performed well on all United States Medical Licensing Examinations (USMLE), demonstrating a high level of insight into a physician's knowledge base and clinical reasoning ability.<sup>1,2</sup> This study aims to evaluate the performance of ChatGPT on the American Society of Plastic Surgeons (ASPS) Plastic Surgery In-Training Examination (PSITE) to assess its clinical reasoning and decision-making ability and investigate its legitimacy related to plastic surgery competencies.</p><p><strong>Methods: </strong>PSITE questions from 2015 to 2023 were included in this study. Questions with images, charts, and graphs were excluded. ChatGPT 3.5 was prompted to provide the best single letter answer choice. Performance was analyzed across test years, question area of content, taxonomy, and core competency via chi-square analysis. Multivariable logistic regression was performed to identify predictors of ChatGPT performance.</p><p><strong>Results: </strong>In this study, 1850 of 2097 multiple choice questions were included. ChatGPT answered 845 (45.7%) questions correctly, performing the highest on breast/cosmetic topics (49.6%) (<i>P</i> = .070). ChatGPT performed significantly better on questions requiring the lowest level of reasoning (knowledge, 55.1%) compared with more complex questions such as analysis (41.4%) (<i>P</i> = .001). Multivariable analysis identified negative predictors of performance including the hand/lower extremity topic (OR = 0.73, <i>P</i> = .038) and taxonomy levels beyond knowledge (<i>P</i> < .05). Performance on the 2023 exam (53.4%) corresponded to a 4th percentile score when compared with all plastic surgery residents.</p><p><strong>Conclusions: </strong>While ChatGPT's performance has shown promise in other medical domains, our results indicate it may not be a reliable source of information for plastic surgery-related questions or decision-making.</p>","PeriodicalId":93993,"journal":{"name":"Eplasty","volume":"24 ","pages":"e68"},"PeriodicalIF":0.0000,"publicationDate":"2024-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12132409/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Eplasty","FirstCategoryId":"1085","ListUrlMain":"","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/1/1 0:00:00","PubModel":"eCollection","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Background: Recently, the artificial intelligence chatbot Chat Generative Pre-Trained Transformer (ChatGPT) performed well on all United States Medical Licensing Examinations (USMLE), demonstrating a high level of insight into a physician's knowledge base and clinical reasoning ability.1,2 This study aims to evaluate the performance of ChatGPT on the American Society of Plastic Surgeons (ASPS) Plastic Surgery In-Training Examination (PSITE) to assess its clinical reasoning and decision-making ability and investigate its legitimacy related to plastic surgery competencies.

Methods: PSITE questions from 2015 to 2023 were included in this study. Questions with images, charts, and graphs were excluded. ChatGPT 3.5 was prompted to provide the best single letter answer choice. Performance was analyzed across test years, question area of content, taxonomy, and core competency via chi-square analysis. Multivariable logistic regression was performed to identify predictors of ChatGPT performance.

Results: In this study, 1850 of 2097 multiple choice questions were included. ChatGPT answered 845 (45.7%) questions correctly, performing the highest on breast/cosmetic topics (49.6%) (P = .070). ChatGPT performed significantly better on questions requiring the lowest level of reasoning (knowledge, 55.1%) compared with more complex questions such as analysis (41.4%) (P = .001). Multivariable analysis identified negative predictors of performance including the hand/lower extremity topic (OR = 0.73, P = .038) and taxonomy levels beyond knowledge (P < .05). Performance on the 2023 exam (53.4%) corresponded to a 4th percentile score when compared with all plastic surgery residents.

Conclusions: While ChatGPT's performance has shown promise in other medical domains, our results indicate it may not be a reliable source of information for plastic surgery-related questions or decision-making.

ChatGPT在整形外科培训考试中的表现。
背景:最近,人工智能聊天机器人聊天生成预训练变压器(ChatGPT)在所有美国医疗执照考试(USMLE)中表现良好,显示出对医生知识基础和临床推理能力的高水平洞察力。1,2本研究旨在评估ChatGPT在美国整形外科学会(ASPS)整形外科培训考试(PSITE)中的表现,以评估其临床推理和决策能力,并探讨其与整形外科能力相关的合法性。方法:本研究纳入2015 - 2023年PSITE问题。带有图像、图表和图形的问题被排除在外。ChatGPT 3.5提示提供最佳的单字母答案选择。通过卡方分析,分析了不同测试年份、问题范围、分类和核心能力的表现。采用多变量逻辑回归来确定ChatGPT性能的预测因子。结果:本研究共纳入2097道选择题中的1850道。ChatGPT正确回答了845个问题(45.7%),在乳房/美容主题上表现最高(49.6%)(P = 0.070)。ChatGPT在要求最低推理水平的问题(知识,55.1%)上的表现明显优于分析等更复杂的问题(41.4%)(P = .001)。多变量分析发现,包括手/下肢主题(OR = 0.73, P = 0.038)和超越知识的分类水平(P < 0.05)在内的负向预测因素影响了成绩。与所有整形外科住院医师相比,2023年的考试成绩(53.4%)仅为第4百分位。结论:虽然ChatGPT的表现在其他医疗领域显示出前景,但我们的研究结果表明,它可能不是整形手术相关问题或决策的可靠信息来源。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信