R. Salvador , D. Vas , L. Oleaga , M. Matute-González , À. Castillo-Fortuño , X. Setoain , C. Nicolau
{"title":"比较ChatGPT和医学生在基于真实图像的放射学和医学应用物理考试中的表现","authors":"R. Salvador , D. Vas , L. Oleaga , M. Matute-González , À. Castillo-Fortuño , X. Setoain , C. Nicolau","doi":"10.1016/j.rxeng.2025.101638","DOIUrl":null,"url":null,"abstract":"<div><h3>Introduction</h3><div>Artificial intelligence models can provide textual answers to a wide range of questions, including medical questions. Recently, these models have incorporated the ability to interpret and answer image-based questions, and this includes radiological images. The main objective of this study is to analyse the performance of ChatGPT-4o compared to third-year medical students in a Radiology and Applied Physics in Medicine practical exam. We also intend to assess the capacity of ChatGPT to interpret medical images and answer related questions.</div></div><div><h3>Materials and methods</h3><div>Thirty-three students set an exam of 10 questions on radiological and nuclear medicine images. Exactly the same exam in the same format was given to ChatGPT (version GPT-4) without prior training. The exam responses were evaluated by professors who were unaware of which exam corresponded to which respondent type. The Mann–Whitney <em>U</em> test was used to compare the results of the two groups.</div></div><div><h3>Results</h3><div>The students outperformed ChatGPT on eight questions. The students’ average final score was 7.78, while ChatGPT’s was 6.05, placing it in the 9th percentile of the students’ grade distribution.</div></div><div><h3>Discussion</h3><div>ChatGPT demonstrates competent performance in several areas, but students achieve better grades, especially in the interpretation of images and contextualised clinical reasoning, where students’ training and practical experience play an essential role. Improvements in AI models are still needed to achieve human-like capabilities in interpreting radiological images and integrating clinical information.</div></div>","PeriodicalId":94185,"journal":{"name":"Radiologia","volume":"67 4","pages":"Article 101638"},"PeriodicalIF":0.0000,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Comparing ChatGPT and medical student performance in a real image-based Radiology and Applied Physics in Medicine exam\",\"authors\":\"R. Salvador , D. Vas , L. Oleaga , M. Matute-González , À. Castillo-Fortuño , X. Setoain , C. Nicolau\",\"doi\":\"10.1016/j.rxeng.2025.101638\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><h3>Introduction</h3><div>Artificial intelligence models can provide textual answers to a wide range of questions, including medical questions. Recently, these models have incorporated the ability to interpret and answer image-based questions, and this includes radiological images. The main objective of this study is to analyse the performance of ChatGPT-4o compared to third-year medical students in a Radiology and Applied Physics in Medicine practical exam. We also intend to assess the capacity of ChatGPT to interpret medical images and answer related questions.</div></div><div><h3>Materials and methods</h3><div>Thirty-three students set an exam of 10 questions on radiological and nuclear medicine images. Exactly the same exam in the same format was given to ChatGPT (version GPT-4) without prior training. The exam responses were evaluated by professors who were unaware of which exam corresponded to which respondent type. The Mann–Whitney <em>U</em> test was used to compare the results of the two groups.</div></div><div><h3>Results</h3><div>The students outperformed ChatGPT on eight questions. The students’ average final score was 7.78, while ChatGPT’s was 6.05, placing it in the 9th percentile of the students’ grade distribution.</div></div><div><h3>Discussion</h3><div>ChatGPT demonstrates competent performance in several areas, but students achieve better grades, especially in the interpretation of images and contextualised clinical reasoning, where students’ training and practical experience play an essential role. Improvements in AI models are still needed to achieve human-like capabilities in interpreting radiological images and integrating clinical information.</div></div>\",\"PeriodicalId\":94185,\"journal\":{\"name\":\"Radiologia\",\"volume\":\"67 4\",\"pages\":\"Article 101638\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2025-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Radiologia\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2173510725000928\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Radiologia","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2173510725000928","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Comparing ChatGPT and medical student performance in a real image-based Radiology and Applied Physics in Medicine exam
Introduction
Artificial intelligence models can provide textual answers to a wide range of questions, including medical questions. Recently, these models have incorporated the ability to interpret and answer image-based questions, and this includes radiological images. The main objective of this study is to analyse the performance of ChatGPT-4o compared to third-year medical students in a Radiology and Applied Physics in Medicine practical exam. We also intend to assess the capacity of ChatGPT to interpret medical images and answer related questions.
Materials and methods
Thirty-three students set an exam of 10 questions on radiological and nuclear medicine images. Exactly the same exam in the same format was given to ChatGPT (version GPT-4) without prior training. The exam responses were evaluated by professors who were unaware of which exam corresponded to which respondent type. The Mann–Whitney U test was used to compare the results of the two groups.
Results
The students outperformed ChatGPT on eight questions. The students’ average final score was 7.78, while ChatGPT’s was 6.05, placing it in the 9th percentile of the students’ grade distribution.
Discussion
ChatGPT demonstrates competent performance in several areas, but students achieve better grades, especially in the interpretation of images and contextualised clinical reasoning, where students’ training and practical experience play an essential role. Improvements in AI models are still needed to achieve human-like capabilities in interpreting radiological images and integrating clinical information.