ChatGPT's Performance on Portuguese Medical Examination Questions: Comparative Analysis of ChatGPT-3.5 Turbo and ChatGPT-4o Mini.

IF 3.2 Q1 EDUCATION, SCIENTIFIC DISCIPLINES
Filipe Prazeres
{"title":"ChatGPT's Performance on Portuguese Medical Examination Questions: Comparative Analysis of ChatGPT-3.5 Turbo and ChatGPT-4o Mini.","authors":"Filipe Prazeres","doi":"10.2196/65108","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Advancements in ChatGPT are transforming medical education by providing new tools for assessment and learning, potentially enhancing evaluations for doctors and improving instructional effectiveness.</p><p><strong>Objective: </strong>This study evaluates the performance and consistency of ChatGPT-3.5 Turbo and ChatGPT-4o mini in solving European Portuguese medical examination questions (2023 National Examination for Access to Specialized Training; Prova Nacional de Acesso à Formação Especializada [PNA]) and compares their performance to human candidates.</p><p><strong>Methods: </strong>ChatGPT-3.5 Turbo was tested on the first part of the examination (74 questions) on July 18, 2024, and ChatGPT-4o mini on the second part (74 questions) on July 19, 2024. Each model generated an answer using its natural language processing capabilities. To test consistency, each model was asked, \"Are you sure?\" after providing an answer. Differences between the first and second responses of each model were analyzed using the McNemar test with continuity correction. A single-parameter t test compared the models' performance to human candidates. Frequencies and percentages were used for categorical variables, and means and CIs for numerical variables. Statistical significance was set at P<.05.</p><p><strong>Results: </strong>ChatGPT-4o mini achieved an accuracy rate of 65% (48/74) on the 2023 PNA examination, surpassing ChatGPT-3.5 Turbo. ChatGPT-4o mini outperformed medical candidates, while ChatGPT-3.5 Turbo had a more moderate performance.</p><p><strong>Conclusions: </strong>This study highlights the advancements and potential of ChatGPT models in medical education, emphasizing the need for careful implementation with teacher oversight and further research.</p>","PeriodicalId":36236,"journal":{"name":"JMIR Medical Education","volume":"11 ","pages":"e65108"},"PeriodicalIF":3.2000,"publicationDate":"2025-03-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11902880/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"JMIR Medical Education","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2196/65108","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"EDUCATION, SCIENTIFIC DISCIPLINES","Score":null,"Total":0}
引用次数: 0

Abstract

Background: Advancements in ChatGPT are transforming medical education by providing new tools for assessment and learning, potentially enhancing evaluations for doctors and improving instructional effectiveness.

Objective: This study evaluates the performance and consistency of ChatGPT-3.5 Turbo and ChatGPT-4o mini in solving European Portuguese medical examination questions (2023 National Examination for Access to Specialized Training; Prova Nacional de Acesso à Formação Especializada [PNA]) and compares their performance to human candidates.

Methods: ChatGPT-3.5 Turbo was tested on the first part of the examination (74 questions) on July 18, 2024, and ChatGPT-4o mini on the second part (74 questions) on July 19, 2024. Each model generated an answer using its natural language processing capabilities. To test consistency, each model was asked, "Are you sure?" after providing an answer. Differences between the first and second responses of each model were analyzed using the McNemar test with continuity correction. A single-parameter t test compared the models' performance to human candidates. Frequencies and percentages were used for categorical variables, and means and CIs for numerical variables. Statistical significance was set at P<.05.

Results: ChatGPT-4o mini achieved an accuracy rate of 65% (48/74) on the 2023 PNA examination, surpassing ChatGPT-3.5 Turbo. ChatGPT-4o mini outperformed medical candidates, while ChatGPT-3.5 Turbo had a more moderate performance.

Conclusions: This study highlights the advancements and potential of ChatGPT models in medical education, emphasizing the need for careful implementation with teacher oversight and further research.

ChatGPT在葡萄牙医学考题中的表现:ChatGPT-3.5 Turbo与ChatGPT- 40 Mini的对比分析
背景:ChatGPT的进步正在通过提供新的评估和学习工具来改变医学教育,有可能加强对医生的评估并提高教学效率。目的:评价ChatGPT-3.5 Turbo和chatgpt - 40 mini在解决欧洲葡萄牙医学考题(2023年全国专科培训资格考试;(PNA),并将它们的表现与人类候选人进行比较。方法:ChatGPT-3.5 Turbo于2024年7月18日参加第一部分考试(74题),chatgpt - 40 mini于2024年7月19日参加第二部分考试(74题)。每个模型都使用其自然语言处理能力生成答案。为了测试一致性,每个模型在给出答案后都会被问到:“你确定吗?”每个模型的第一和第二反应之间的差异分析使用连续性校正McNemar检验。单参数t检验比较了模型与人类候选人的表现。频率和百分比用于分类变量,均值和ci用于数值变量。结果显示,chatgpt - 40 mini在2023 PNA检查中达到65%(48/74)的准确率,超过ChatGPT-3.5 Turbo。chatgpt - 40 mini的表现优于医疗候选人,而ChatGPT-3.5 Turbo的表现更为温和。结论:本研究强调了ChatGPT模型在医学教育中的进步和潜力,强调了在教师监督下仔细实施和进一步研究的必要性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
JMIR Medical Education
JMIR Medical Education Social Sciences-Education
CiteScore
6.90
自引率
5.60%
发文量
54
审稿时长
8 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信