Transforming Neurosurgical Practice with Large Language Models: Comparative Performance of ChatGPT-Omni and Gemini in Complex Case Management.

IF 1.9 4区 医学 Q3 CLINICAL NEUROLOGY
Baris Colluoglu, Samil Dikici
{"title":"Transforming Neurosurgical Practice with Large Language Models: Comparative Performance of ChatGPT-Omni and Gemini in Complex Case Management.","authors":"Baris Colluoglu, Samil Dikici","doi":"10.1016/j.wneu.2025.124103","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Recent advancements in artificial intelligence, particularly in large language models (LLMs), have catalyzed new opportunities within medical domains, including neurosurgery. This study aims to evaluate and compare the performance of two advanced LLMs-ChatGPT-Omni and Gemini-in addressing clinical case inquiries on various neurosurgical conditions.</p><p><strong>Methods: </strong>A prospective observational study was conducted utilizing 500 case-based questions relevant to neurosurgery, covering 10 prevalent conditions. The questions were designed to simulate real-world clinical scenarios encompassing diagnosis, interpretation, and management and were asked again two months (phase 2) later. Responses were evaluated using a 6-point Likert scale by two independent neurosurgeons.</p><p><strong>Results: </strong>ChatGPT-Omni exhibited consistent superiority across all evaluation metrics. In Phase 1, its overall average score across all conditions was 5.38±0.12, which increased to 5.46±0.08 in Phase 2 (p<0.001). While exhibiting moderate improvements, Gemini trailed behind ChatGPT-Omni with an overall average score of 4.93±0.15 in Phase 1, which improved to 5.1±0.14 in Phase 2 (p<0.001). Subgroup analyses indicated that ChatGPT-Omni provided superior contextual accuracy across all conditions (p<0.001).</p><p><strong>Conclusion: </strong>The study underscores the transformative potential of LLMs in neurosurgery, with ChatGPT-Omni demonstrating superior accuracy, relevance, and clarity compared to Gemini. While both models improved over time, ChatGPT-Omni consistently excelled across all clinical scenarios, highlighting its potential utility in neurosurgical decision support and education.</p>","PeriodicalId":23906,"journal":{"name":"World neurosurgery","volume":" ","pages":"124103"},"PeriodicalIF":1.9000,"publicationDate":"2025-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"World neurosurgery","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1016/j.wneu.2025.124103","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"CLINICAL NEUROLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Background: Recent advancements in artificial intelligence, particularly in large language models (LLMs), have catalyzed new opportunities within medical domains, including neurosurgery. This study aims to evaluate and compare the performance of two advanced LLMs-ChatGPT-Omni and Gemini-in addressing clinical case inquiries on various neurosurgical conditions.

Methods: A prospective observational study was conducted utilizing 500 case-based questions relevant to neurosurgery, covering 10 prevalent conditions. The questions were designed to simulate real-world clinical scenarios encompassing diagnosis, interpretation, and management and were asked again two months (phase 2) later. Responses were evaluated using a 6-point Likert scale by two independent neurosurgeons.

Results: ChatGPT-Omni exhibited consistent superiority across all evaluation metrics. In Phase 1, its overall average score across all conditions was 5.38±0.12, which increased to 5.46±0.08 in Phase 2 (p<0.001). While exhibiting moderate improvements, Gemini trailed behind ChatGPT-Omni with an overall average score of 4.93±0.15 in Phase 1, which improved to 5.1±0.14 in Phase 2 (p<0.001). Subgroup analyses indicated that ChatGPT-Omni provided superior contextual accuracy across all conditions (p<0.001).

Conclusion: The study underscores the transformative potential of LLMs in neurosurgery, with ChatGPT-Omni demonstrating superior accuracy, relevance, and clarity compared to Gemini. While both models improved over time, ChatGPT-Omni consistently excelled across all clinical scenarios, highlighting its potential utility in neurosurgical decision support and education.

用大型语言模型转化神经外科实践:ChatGPT-Omni和Gemini在复杂病例管理中的比较表现。
背景:人工智能的最新进展,特别是在大型语言模型(llm)方面,催化了包括神经外科在内的医学领域的新机会。本研究旨在评估和比较两种先进的llms (chatgpt - omni和gemini)在解决各种神经外科疾病的临床病例询问方面的表现。方法:一项前瞻性观察研究利用500个与神经外科相关的病例为基础的问题,涵盖10种常见疾病。这些问题被设计成模拟现实世界的临床场景,包括诊断、解释和管理,并在两个月后(第二阶段)再次被问及。两名独立的神经外科医生使用6分李克特量表对反应进行评估。结果:ChatGPT-Omni在所有评估指标中表现出一致的优势。在第一阶段,它在所有条件下的总体平均得分为5.38±0.12,在第二阶段增加到5.46±0.08。结论:该研究强调了llm在神经外科中的变革潜力,与Gemini相比,ChatGPT-Omni表现出更高的准确性、相关性和清晰度。随着时间的推移,这两种模型都有所改进,但ChatGPT-Omni在所有临床场景中都表现出色,突出了其在神经外科决策支持和教育方面的潜在效用。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
World neurosurgery
World neurosurgery CLINICAL NEUROLOGY-SURGERY
CiteScore
3.90
自引率
15.00%
发文量
1765
审稿时长
47 days
期刊介绍: World Neurosurgery has an open access mirror journal World Neurosurgery: X, sharing the same aims and scope, editorial team, submission system and rigorous peer review. The journal''s mission is to: -To provide a first-class international forum and a 2-way conduit for dialogue that is relevant to neurosurgeons and providers who care for neurosurgery patients. The categories of the exchanged information include clinical and basic science, as well as global information that provide social, political, educational, economic, cultural or societal insights and knowledge that are of significance and relevance to worldwide neurosurgery patient care. -To act as a primary intellectual catalyst for the stimulation of creativity, the creation of new knowledge, and the enhancement of quality neurosurgical care worldwide. -To provide a forum for communication that enriches the lives of all neurosurgeons and their colleagues; and, in so doing, enriches the lives of their patients. Topics to be addressed in World Neurosurgery include: EDUCATION, ECONOMICS, RESEARCH, POLITICS, HISTORY, CULTURE, CLINICAL SCIENCE, LABORATORY SCIENCE, TECHNOLOGY, OPERATIVE TECHNIQUES, CLINICAL IMAGES, VIDEOS
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信