Performance evaluation of ChatGPT-4.0 and Gemini on image-based neurosurgery board practice questions: A comparative analysis

IF 1.9 4区 医学 Q3 CLINICAL NEUROLOGY
Alana M. McNulty , Harshitha Valluri , Avi A. Gajjar, Amanda Custozzo, Nicholas C. Field, Alexandra R. Paul
{"title":"Performance evaluation of ChatGPT-4.0 and Gemini on image-based neurosurgery board practice questions: A comparative analysis","authors":"Alana M. McNulty ,&nbsp;Harshitha Valluri ,&nbsp;Avi A. Gajjar,&nbsp;Amanda Custozzo,&nbsp;Nicholas C. Field,&nbsp;Alexandra R. Paul","doi":"10.1016/j.jocn.2025.111097","DOIUrl":null,"url":null,"abstract":"<div><h3>Introduction</h3><div>Artificial intelligence (AI) has gained significant attention in medicine, particularly in neurosurgery, where its potential is frequently discussed and occasionally feared. Large language models (LLMs), such as ChatGPT-4.0 (OpenAI) and Gemini (Google DeepMind), have shown promise in text-based tasks but remain underexplored in image-based domains, which are essential for neurosurgery. This study evaluates the performance of ChatGPT-4.0 and Gemini on image-based neurosurgery board practice questions, focusing on their ability to interpret visual data, a critical aspect of neurosurgical decision-making.</div></div><div><h3>Methods</h3><div>A total of 250 image-based questions selected from two neurosurgical review textbooks were obtained. Each question was presented to both ChatGPT-4.0 and Gemini in its original format, including images such as MRI scans, pathology slides, and surgical visuals. The models were tasked with answering the questions, and their accuracy was determined based on the number of correct responses.</div></div><div><h3>Results</h3><div>ChatGPT-4.0 correctly answered 84 questions (33.6 %), significantly outperforming Gemini, which answered only 1 question correctly (0.4 %) (p &lt; 0.0001). ChatGPT-4.0 provided correct answers for 17.7 % of questions from The Comprehensive Neurosurgery Board Preparation Book and 50.0 % from Neurosurgery Board Review. Gemini exhibited a 17.8 % “inability response” rate, explicitly stating it could not interpret images. The performance gap between the two models was significant (p &lt; 0.0001), highlighting their limitations in handling complex visual data.</div></div><div><h3>Conclusions</h3><div>While ChatGPT-4.0 demonstrated some capacity to interpret image-based neurosurgery board questions, both models exhibited significant limitations, particularly in processing and analyzing complex visual data. These findings emphasize the need for targeted advancements in AI to improve visual interpretation in neurosurgical education and practice.</div></div>","PeriodicalId":15487,"journal":{"name":"Journal of Clinical Neuroscience","volume":"134 ","pages":"Article 111097"},"PeriodicalIF":1.9000,"publicationDate":"2025-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Clinical Neuroscience","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0967586825000694","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"CLINICAL NEUROLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Introduction

Artificial intelligence (AI) has gained significant attention in medicine, particularly in neurosurgery, where its potential is frequently discussed and occasionally feared. Large language models (LLMs), such as ChatGPT-4.0 (OpenAI) and Gemini (Google DeepMind), have shown promise in text-based tasks but remain underexplored in image-based domains, which are essential for neurosurgery. This study evaluates the performance of ChatGPT-4.0 and Gemini on image-based neurosurgery board practice questions, focusing on their ability to interpret visual data, a critical aspect of neurosurgical decision-making.

Methods

A total of 250 image-based questions selected from two neurosurgical review textbooks were obtained. Each question was presented to both ChatGPT-4.0 and Gemini in its original format, including images such as MRI scans, pathology slides, and surgical visuals. The models were tasked with answering the questions, and their accuracy was determined based on the number of correct responses.

Results

ChatGPT-4.0 correctly answered 84 questions (33.6 %), significantly outperforming Gemini, which answered only 1 question correctly (0.4 %) (p < 0.0001). ChatGPT-4.0 provided correct answers for 17.7 % of questions from The Comprehensive Neurosurgery Board Preparation Book and 50.0 % from Neurosurgery Board Review. Gemini exhibited a 17.8 % “inability response” rate, explicitly stating it could not interpret images. The performance gap between the two models was significant (p < 0.0001), highlighting their limitations in handling complex visual data.

Conclusions

While ChatGPT-4.0 demonstrated some capacity to interpret image-based neurosurgery board questions, both models exhibited significant limitations, particularly in processing and analyzing complex visual data. These findings emphasize the need for targeted advancements in AI to improve visual interpretation in neurosurgical education and practice.
求助全文
约1分钟内获得全文 求助全文
来源期刊
Journal of Clinical Neuroscience
Journal of Clinical Neuroscience 医学-临床神经学
CiteScore
4.50
自引率
0.00%
发文量
402
审稿时长
40 days
期刊介绍: This International journal, Journal of Clinical Neuroscience, publishes articles on clinical neurosurgery and neurology and the related neurosciences such as neuro-pathology, neuro-radiology, neuro-ophthalmology and neuro-physiology. The journal has a broad International perspective, and emphasises the advances occurring in Asia, the Pacific Rim region, Europe and North America. The Journal acts as a focus for publication of major clinical and laboratory research, as well as publishing solicited manuscripts on specific subjects from experts, case reports and other information of interest to clinicians working in the clinical neurosciences.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信