ChatGPT plus（4.0版本）与预训练AI模型（Orthopod）在骨科培训考试（OITE）中的比较

IF 2.3 4区医学 Q2 SURGERY

Surgeon-Journal of the Royal Colleges of Surgeons of Edinburgh and Ireland Pub Date : 2025-04-22 DOI:10.1016/j.surge.2025.04.004

Matthew L. Magruder , Michael Miskiewicz , Ariel N. Rodriguez , Mitchell Ng , Amr Abdelgawad

{"title":"ChatGPT plus（4.0版本）与预训练AI模型（Orthopod）在骨科培训考试（OITE）中的比较","authors":"Matthew L. Magruder , Michael Miskiewicz , Ariel N. Rodriguez , Mitchell Ng , Amr Abdelgawad","doi":"10.1016/j.surge.2025.04.004","DOIUrl":null,"url":null,"abstract":"<div><h3>Introduction</h3><div>Recent advancements in large language model (LLM) artificial intelligence (AI) systems, like ChatGPT, have showcased ability in answering standardized examination questions, but their performance is variable. The goal of this study was to compare the performance of standard ChatGPT-4 with a custom-trained ChatGPT model taking the Orthopaedic Surgery In-Training Examination (OITE).</div></div><div><h3>Methods</h3><div>Practice questions for the 2022 OITE, made available on the AAOS-ResStudy website (aaos.org/education/examinations/ResStudy), were used for this study. Question stems were uploaded to both standard ChatGPT-4 and the custom-trained ChatGPT model (Orthopod), and the responses were documented as correct or incorrect. For questions containing media elements, screenshots were converted to PNG files and uploaded to ChatGPT. Evaluation of the AI's performance included descriptive statistics to determine the percent of questions answered correctly or incorrectly.</div></div><div><h3>Results</h3><div>Two-hundred and seven questions were analyzed with both ChatGPT 4.0 and Orthopod. ChatGPT correctly answered 73.43 % (152/207) of the questions, while Orthopod correctly answered 71.01 % (147/207) of the questions. There was no significant difference in performance of either language model based on inclusion of media or question category.</div></div><div><h3>Conclusion</h3><div>ChatGPT 4.0 and Orthopod correctly answered 73.43 % and 71.01 % of OITE practice questions correctly. Both systems provided well-reasoned answers in response to multiple choice questions. The thoughtfully articulated responses and well-supported explanations offered by both systems may prove to be a valuable educational resource for orthopedic residents as they prepare for upcoming board-style exams.</div></div><div><h3>Level of evidence</h3><div>IV.</div></div>","PeriodicalId":49463,"journal":{"name":"Surgeon-Journal of the Royal Colleges of Surgeons of Edinburgh and Ireland","volume":"23 3","pages":"Pages 187-191"},"PeriodicalIF":2.3000,"publicationDate":"2025-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Comparison of ChatGPT plus (version 4.0) and pretrained AI model (Orthopod) on orthopaedic in-training exam (OITE)\",\"authors\":\"Matthew L. Magruder , Michael Miskiewicz , Ariel N. Rodriguez , Mitchell Ng , Amr Abdelgawad\",\"doi\":\"10.1016/j.surge.2025.04.004\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><h3>Introduction</h3><div>Recent advancements in large language model (LLM) artificial intelligence (AI) systems, like ChatGPT, have showcased ability in answering standardized examination questions, but their performance is variable. The goal of this study was to compare the performance of standard ChatGPT-4 with a custom-trained ChatGPT model taking the Orthopaedic Surgery In-Training Examination (OITE).</div></div><div><h3>Methods</h3><div>Practice questions for the 2022 OITE, made available on the AAOS-ResStudy website (aaos.org/education/examinations/ResStudy), were used for this study. Question stems were uploaded to both standard ChatGPT-4 and the custom-trained ChatGPT model (Orthopod), and the responses were documented as correct or incorrect. For questions containing media elements, screenshots were converted to PNG files and uploaded to ChatGPT. Evaluation of the AI's performance included descriptive statistics to determine the percent of questions answered correctly or incorrectly.</div></div><div><h3>Results</h3><div>Two-hundred and seven questions were analyzed with both ChatGPT 4.0 and Orthopod. ChatGPT correctly answered 73.43 % (152/207) of the questions, while Orthopod correctly answered 71.01 % (147/207) of the questions. There was no significant difference in performance of either language model based on inclusion of media or question category.</div></div><div><h3>Conclusion</h3><div>ChatGPT 4.0 and Orthopod correctly answered 73.43 % and 71.01 % of OITE practice questions correctly. Both systems provided well-reasoned answers in response to multiple choice questions. The thoughtfully articulated responses and well-supported explanations offered by both systems may prove to be a valuable educational resource for orthopedic residents as they prepare for upcoming board-style exams.</div></div><div><h3>Level of evidence</h3><div>IV.</div></div>\",\"PeriodicalId\":49463,\"journal\":{\"name\":\"Surgeon-Journal of the Royal Colleges of Surgeons of Edinburgh and Ireland\",\"volume\":\"23 3\",\"pages\":\"Pages 187-191\"},\"PeriodicalIF\":2.3000,\"publicationDate\":\"2025-04-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Surgeon-Journal of the Royal Colleges of Surgeons of Edinburgh and Ireland\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1479666X2500054X\",\"RegionNum\":4,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"SURGERY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Surgeon-Journal of the Royal Colleges of Surgeons of Edinburgh and Ireland","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1479666X2500054X","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"SURGERY","Score":null,"Total":0}

引用次数: 0

摘要

简介：大型语言模型（LLM）人工智能（AI）系统的最新进展，如ChatGPT，已经展示了回答标准化考试问题的能力，但它们的表现是可变的。本研究的目的是比较标准ChatGPT-4与定制训练的ChatGPT模型在骨科外科培训考试（OITE）中的表现。方法：本研究使用的是美国科学协会- resstudy网站（aaos.org/education/examinations/ResStudy）提供的2022年OITE的练习题。问题被上传到标准ChatGPT-4和定制训练的ChatGPT模型（Orthopod）中，回答被记录为正确或不正确。对于包含媒体元素的问题，将截图转换为PNG文件并上传到ChatGPT。对人工智能表现的评估包括描述性统计，以确定正确或错误回答问题的百分比。结果：使用ChatGPT 4.0和Orthopod分析了257个问题。ChatGPT正确率为73.43% (152/207)，Orthopod正确率为71.01%（147/207）。基于媒体或问题类别的两种语言模型的表现均无显著差异。结论：ChatGPT 4.0和Orthopod的正确率分别为73.43%和71.01%。这两个系统都为选择题提供了合理的答案。在骨科住院医师准备即将到来的委员会式考试时，这两种系统提供的深思熟虑的回答和有充分支持的解释可能会成为宝贵的教育资源。证据等级：四级。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Comparison of ChatGPT plus (version 4.0) and pretrained AI model (Orthopod) on orthopaedic in-training exam (OITE)

Introduction

Recent advancements in large language model (LLM) artificial intelligence (AI) systems, like ChatGPT, have showcased ability in answering standardized examination questions, but their performance is variable. The goal of this study was to compare the performance of standard ChatGPT-4 with a custom-trained ChatGPT model taking the Orthopaedic Surgery In-Training Examination (OITE).

Methods

Practice questions for the 2022 OITE, made available on the AAOS-ResStudy website (aaos.org/education/examinations/ResStudy), were used for this study. Question stems were uploaded to both standard ChatGPT-4 and the custom-trained ChatGPT model (Orthopod), and the responses were documented as correct or incorrect. For questions containing media elements, screenshots were converted to PNG files and uploaded to ChatGPT. Evaluation of the AI's performance included descriptive statistics to determine the percent of questions answered correctly or incorrectly.

Results

Two-hundred and seven questions were analyzed with both ChatGPT 4.0 and Orthopod. ChatGPT correctly answered 73.43 % (152/207) of the questions, while Orthopod correctly answered 71.01 % (147/207) of the questions. There was no significant difference in performance of either language model based on inclusion of media or question category.

Conclusion

ChatGPT 4.0 and Orthopod correctly answered 73.43 % and 71.01 % of OITE practice questions correctly. Both systems provided well-reasoned answers in response to multiple choice questions. The thoughtfully articulated responses and well-supported explanations offered by both systems may prove to be a valuable educational resource for orthopedic residents as they prepare for upcoming board-style exams.

Level of evidence

IV.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Surgeon-Journal of the Royal Colleges of Surgeons of Edinburgh and Ireland 医学-外科

CiteScore

4.40

自引率

0.00%

发文量

158

审稿时长

6-12 weeks

期刊介绍： Since its establishment in 2003, The Surgeon has established itself as one of the leading multidisciplinary surgical titles, both in print and online. The Surgeon is published for the worldwide surgical and dental communities. The goal of the Journal is to achieve wider national and international recognition, through a commitment to excellence in original research. In addition, both Colleges see the Journal as an important educational service, and consequently there is a particular focus on post-graduate development. Much of our educational role will continue to be achieved through publishing expanded review articles by leaders in their field. Articles in related areas to surgery and dentistry, such as healthcare management and education, are also welcomed. We aim to educate, entertain, give insight into new surgical techniques and technology, and provide a forum for debate and discussion.