比较ChatGPT和医学生在基于真实图像的放射学和医学应用物理考试中的表现

R. Salvador , D. Vas , L. Oleaga , M. Matute-González , À. Castillo-Fortuño , X. Setoain , C. Nicolau
{"title":"比较ChatGPT和医学生在基于真实图像的放射学和医学应用物理考试中的表现","authors":"R. Salvador ,&nbsp;D. Vas ,&nbsp;L. Oleaga ,&nbsp;M. Matute-González ,&nbsp;À. Castillo-Fortuño ,&nbsp;X. Setoain ,&nbsp;C. Nicolau","doi":"10.1016/j.rxeng.2025.101638","DOIUrl":null,"url":null,"abstract":"<div><h3>Introduction</h3><div>Artificial intelligence models can provide textual answers to a wide range of questions, including medical questions. Recently, these models have incorporated the ability to interpret and answer image-based questions, and this includes radiological images. The main objective of this study is to analyse the performance of ChatGPT-4o compared to third-year medical students in a Radiology and Applied Physics in Medicine practical exam. We also intend to assess the capacity of ChatGPT to interpret medical images and answer related questions.</div></div><div><h3>Materials and methods</h3><div>Thirty-three students set an exam of 10 questions on radiological and nuclear medicine images. Exactly the same exam in the same format was given to ChatGPT (version GPT-4) without prior training. The exam responses were evaluated by professors who were unaware of which exam corresponded to which respondent type. The Mann–Whitney <em>U</em> test was used to compare the results of the two groups.</div></div><div><h3>Results</h3><div>The students outperformed ChatGPT on eight questions. The students’ average final score was 7.78, while ChatGPT’s was 6.05, placing it in the 9th percentile of the students’ grade distribution.</div></div><div><h3>Discussion</h3><div>ChatGPT demonstrates competent performance in several areas, but students achieve better grades, especially in the interpretation of images and contextualised clinical reasoning, where students’ training and practical experience play an essential role. Improvements in AI models are still needed to achieve human-like capabilities in interpreting radiological images and integrating clinical information.</div></div>","PeriodicalId":94185,"journal":{"name":"Radiologia","volume":"67 4","pages":"Article 101638"},"PeriodicalIF":0.0000,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Comparing ChatGPT and medical student performance in a real image-based Radiology and Applied Physics in Medicine exam\",\"authors\":\"R. Salvador ,&nbsp;D. Vas ,&nbsp;L. Oleaga ,&nbsp;M. Matute-González ,&nbsp;À. Castillo-Fortuño ,&nbsp;X. Setoain ,&nbsp;C. Nicolau\",\"doi\":\"10.1016/j.rxeng.2025.101638\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><h3>Introduction</h3><div>Artificial intelligence models can provide textual answers to a wide range of questions, including medical questions. Recently, these models have incorporated the ability to interpret and answer image-based questions, and this includes radiological images. The main objective of this study is to analyse the performance of ChatGPT-4o compared to third-year medical students in a Radiology and Applied Physics in Medicine practical exam. We also intend to assess the capacity of ChatGPT to interpret medical images and answer related questions.</div></div><div><h3>Materials and methods</h3><div>Thirty-three students set an exam of 10 questions on radiological and nuclear medicine images. Exactly the same exam in the same format was given to ChatGPT (version GPT-4) without prior training. The exam responses were evaluated by professors who were unaware of which exam corresponded to which respondent type. The Mann–Whitney <em>U</em> test was used to compare the results of the two groups.</div></div><div><h3>Results</h3><div>The students outperformed ChatGPT on eight questions. The students’ average final score was 7.78, while ChatGPT’s was 6.05, placing it in the 9th percentile of the students’ grade distribution.</div></div><div><h3>Discussion</h3><div>ChatGPT demonstrates competent performance in several areas, but students achieve better grades, especially in the interpretation of images and contextualised clinical reasoning, where students’ training and practical experience play an essential role. Improvements in AI models are still needed to achieve human-like capabilities in interpreting radiological images and integrating clinical information.</div></div>\",\"PeriodicalId\":94185,\"journal\":{\"name\":\"Radiologia\",\"volume\":\"67 4\",\"pages\":\"Article 101638\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2025-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Radiologia\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2173510725000928\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Radiologia","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2173510725000928","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

人工智能模型可以为广泛的问题提供文本答案,包括医学问题。最近,这些模型结合了解释和回答基于图像的问题的能力,这包括放射图像。本研究的主要目的是分析chatgpt - 40与三年级医学生在医学放射学和应用物理实践考试中的表现。我们还打算评估ChatGPT解释医学图像和回答相关问题的能力。材料与方法33名学生以放射学和核医学图像为题编了10道试题。ChatGPT(版本GPT-4)在没有事先培训的情况下,以相同的格式进行了完全相同的考试。这些考试的回答是由不知道哪一种考试对应哪一种答卷类型的教授来评估的。采用Mann-Whitney U检验比较两组结果。结果学生在8个问题上的表现优于ChatGPT。学生的平均期末成绩为7.78分,而ChatGPT的平均期末成绩为6.05分,在学生的年级分布中处于第9百分位。讨论:chatgpt在几个领域表现出色,但学生取得了更好的成绩,特别是在图像解释和情境化临床推理方面,学生的训练和实践经验起着至关重要的作用。在解释放射图像和整合临床信息方面,人工智能模型仍然需要改进,以实现类似人类的能力。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Comparing ChatGPT and medical student performance in a real image-based Radiology and Applied Physics in Medicine exam

Introduction

Artificial intelligence models can provide textual answers to a wide range of questions, including medical questions. Recently, these models have incorporated the ability to interpret and answer image-based questions, and this includes radiological images. The main objective of this study is to analyse the performance of ChatGPT-4o compared to third-year medical students in a Radiology and Applied Physics in Medicine practical exam. We also intend to assess the capacity of ChatGPT to interpret medical images and answer related questions.

Materials and methods

Thirty-three students set an exam of 10 questions on radiological and nuclear medicine images. Exactly the same exam in the same format was given to ChatGPT (version GPT-4) without prior training. The exam responses were evaluated by professors who were unaware of which exam corresponded to which respondent type. The Mann–Whitney U test was used to compare the results of the two groups.

Results

The students outperformed ChatGPT on eight questions. The students’ average final score was 7.78, while ChatGPT’s was 6.05, placing it in the 9th percentile of the students’ grade distribution.

Discussion

ChatGPT demonstrates competent performance in several areas, but students achieve better grades, especially in the interpretation of images and contextualised clinical reasoning, where students’ training and practical experience play an essential role. Improvements in AI models are still needed to achieve human-like capabilities in interpreting radiological images and integrating clinical information.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信