Performance of ChatGPT on questions from the Brazilian College of Radiology annual resident evaluation test.

Q3 Medicine
Radiologia Brasileira Pub Date : 2024-03-25 eCollection Date: 2024-01-01 DOI:10.1590/0100-3984.2023.0083-en
Cleverson Alex Leitão, Gabriel Lucca de Oliveira Salvador, Leda Maria Rabelo, Dante Luiz Escuissato
{"title":"Performance of ChatGPT on questions from the Brazilian College of Radiology annual resident evaluation test.","authors":"Cleverson Alex Leitão, Gabriel Lucca de Oliveira Salvador, Leda Maria Rabelo, Dante Luiz Escuissato","doi":"10.1590/0100-3984.2023.0083-en","DOIUrl":null,"url":null,"abstract":"<p><strong>Objective: </strong>To test the performance of ChatGPT on radiology questions formulated by the Colégio Brasileiro de Radiologia (CBR, Brazilian College of Radiology), evaluating its failures and successes.</p><p><strong>Materials and methods: </strong>165 questions from the CBR annual resident assessment (2018, 2019, and 2022) were presented to ChatGPT. For statistical analysis, the questions were divided by the type of cognitive skills assessed (lower or higher order), by topic (physics or clinical), by subspecialty, by style (description of a clinical finding or sign, clinical management of a case, application of a concept, calculation/classification of findings, correlations between diseases, or anatomy), and by target academic year (all, second/third year, or third year only).</p><p><strong>Results: </strong>ChatGPT answered 88 (53.3%) of the questions correctly. It performed significantly better on the questions assessing lower-order cognitive skills than on those assessing higher-order cognitive skills, providing the correct answer on 38 (64.4%) of 59 questions and on only 50 (47.2%) of 106 questions, respectively (<i>p</i> = 0.01). The accuracy rate was significantly higher for physics questions than for clinical questions, correct answers being provided for 18 (90.0%) of 20 physics questions and for 70 (48.3%) of 145 clinical questions (<i>p</i> = 0.02). There was no significant difference in performance among the subspecialties or among the academic years (<i>p</i> > 0.05).</p><p><strong>Conclusion: </strong>Even without dedicated training in this field, ChatGPT demonstrates reasonable performance, albeit still insufficient for approval, on radiology questions formulated by the CBR.</p>","PeriodicalId":20842,"journal":{"name":"Radiologia Brasileira","volume":"57 ","pages":"e20230083"},"PeriodicalIF":0.0000,"publicationDate":"2024-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11236413/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Radiologia Brasileira","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1590/0100-3984.2023.0083-en","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/1/1 0:00:00","PubModel":"eCollection","JCR":"Q3","JCRName":"Medicine","Score":null,"Total":0}
引用次数: 0

Abstract

Objective: To test the performance of ChatGPT on radiology questions formulated by the Colégio Brasileiro de Radiologia (CBR, Brazilian College of Radiology), evaluating its failures and successes.

Materials and methods: 165 questions from the CBR annual resident assessment (2018, 2019, and 2022) were presented to ChatGPT. For statistical analysis, the questions were divided by the type of cognitive skills assessed (lower or higher order), by topic (physics or clinical), by subspecialty, by style (description of a clinical finding or sign, clinical management of a case, application of a concept, calculation/classification of findings, correlations between diseases, or anatomy), and by target academic year (all, second/third year, or third year only).

Results: ChatGPT answered 88 (53.3%) of the questions correctly. It performed significantly better on the questions assessing lower-order cognitive skills than on those assessing higher-order cognitive skills, providing the correct answer on 38 (64.4%) of 59 questions and on only 50 (47.2%) of 106 questions, respectively (p = 0.01). The accuracy rate was significantly higher for physics questions than for clinical questions, correct answers being provided for 18 (90.0%) of 20 physics questions and for 70 (48.3%) of 145 clinical questions (p = 0.02). There was no significant difference in performance among the subspecialties or among the academic years (p > 0.05).

Conclusion: Even without dedicated training in this field, ChatGPT demonstrates reasonable performance, albeit still insufficient for approval, on radiology questions formulated by the CBR.

ChatGPT 在巴西放射学院住院医师年度评估测试中的表现。
目的测试 ChatGPT 在巴西放射学院(Colégio Brasileiro de Radiologia,CBR)制定的放射学问题上的表现,评估其失败和成功之处。材料和方法:ChatGPT 收到了来自 CBR 年度住院医师评估(2018 年、2019 年和 2022 年)的 165 个问题。为了进行统计分析,这些问题按评估的认知技能类型(低阶或高阶)、主题(物理或临床)、亚专业、风格(临床发现或体征的描述、病例的临床管理、概念的应用、发现的计算/分类、疾病间的相关性或解剖学)以及目标学年(所有学年、第二/第三学年或仅第三学年)进行了划分:ChatGPT 正确回答了 88 个问题(53.3%)。它在评估低阶认知技能的问题上的表现明显优于评估高阶认知技能的问题,分别在 59 个问题中的 38 个(64.4%)和 106 个问题中的 50 个(47.2%)提供了正确答案(p = 0.01)。物理问题的正确率明显高于临床问题,20 道物理问题中有 18 道(90.0%)回答正确,145 道临床问题中有 70 道(48.3%)回答正确(p = 0.02)。不同亚专科和不同学年的学生在成绩上没有明显差异(P > 0.05):结论:即使没有接受过该领域的专门培训,ChatGPT 也能在 CBR 制定的放射学问题上表现出合理的水平,尽管仍不足以获得批准。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Radiologia Brasileira
Radiologia Brasileira Medicine-Radiology, Nuclear Medicine and Imaging
CiteScore
2.60
自引率
0.00%
发文量
75
审稿时长
28 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信