Transforming Neurosurgical Practice with Large Language Models: Comparative Performance of ChatGPT-Omni and Gemini in Complex Case Management.

IF 1.9 4区医学 Q3 CLINICAL NEUROLOGY

World neurosurgery Pub Date : 2025-05-20 DOI:10.1016/j.wneu.2025.124103

Baris Colluoglu, Samil Dikici

{"title":"Transforming Neurosurgical Practice with Large Language Models: Comparative Performance of ChatGPT-Omni and Gemini in Complex Case Management.","authors":"Baris Colluoglu, Samil Dikici","doi":"10.1016/j.wneu.2025.124103","DOIUrl":null,"url":null,"abstract":"Background: Recent advancements in artificial intelligence, particularly in large language models (LLMs), have catalyzed new opportunities within medical domains, including neurosurgery. This study aims to evaluate and compare the performance of two advanced LLMs-ChatGPT-Omni and Gemini-in addressing clinical case inquiries on various neurosurgical conditions.Methods: A prospective observational study was conducted utilizing 500 case-based questions relevant to neurosurgery, covering 10 prevalent conditions. The questions were designed to simulate real-world clinical scenarios encompassing diagnosis, interpretation, and management and were asked again two months (phase 2) later. Responses were evaluated using a 6-point Likert scale by two independent neurosurgeons.Results: ChatGPT-Omni exhibited consistent superiority across all evaluation metrics. In Phase 1, its overall average score across all conditions was 5.38±0.12, which increased to 5.46±0.08 in Phase 2 (p<0.001). While exhibiting moderate improvements, Gemini trailed behind ChatGPT-Omni with an overall average score of 4.93±0.15 in Phase 1, which improved to 5.1±0.14 in Phase 2 (p<0.001). Subgroup analyses indicated that ChatGPT-Omni provided superior contextual accuracy across all conditions (p<0.001).Conclusion: The study underscores the transformative potential of LLMs in neurosurgery, with ChatGPT-Omni demonstrating superior accuracy, relevance, and clarity compared to Gemini. While both models improved over time, ChatGPT-Omni consistently excelled across all clinical scenarios, highlighting its potential utility in neurosurgical decision support and education.","PeriodicalId":23906,"journal":{"name":"World neurosurgery","volume":" ","pages":"124103"},"PeriodicalIF":1.9000,"publicationDate":"2025-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"World neurosurgery","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1016/j.wneu.2025.124103","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"CLINICAL NEUROLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Background: Recent advancements in artificial intelligence, particularly in large language models (LLMs), have catalyzed new opportunities within medical domains, including neurosurgery. This study aims to evaluate and compare the performance of two advanced LLMs-ChatGPT-Omni and Gemini-in addressing clinical case inquiries on various neurosurgical conditions.

Methods: A prospective observational study was conducted utilizing 500 case-based questions relevant to neurosurgery, covering 10 prevalent conditions. The questions were designed to simulate real-world clinical scenarios encompassing diagnosis, interpretation, and management and were asked again two months (phase 2) later. Responses were evaluated using a 6-point Likert scale by two independent neurosurgeons.

Results: ChatGPT-Omni exhibited consistent superiority across all evaluation metrics. In Phase 1, its overall average score across all conditions was 5.38±0.12, which increased to 5.46±0.08 in Phase 2 (p<0.001). While exhibiting moderate improvements, Gemini trailed behind ChatGPT-Omni with an overall average score of 4.93±0.15 in Phase 1, which improved to 5.1±0.14 in Phase 2 (p<0.001). Subgroup analyses indicated that ChatGPT-Omni provided superior contextual accuracy across all conditions (p<0.001).

Conclusion: The study underscores the transformative potential of LLMs in neurosurgery, with ChatGPT-Omni demonstrating superior accuracy, relevance, and clarity compared to Gemini. While both models improved over time, ChatGPT-Omni consistently excelled across all clinical scenarios, highlighting its potential utility in neurosurgical decision support and education.

查看原文本刊更多论文

用大型语言模型转化神经外科实践：ChatGPT-Omni和Gemini在复杂病例管理中的比较表现。

背景：人工智能的最新进展，特别是在大型语言模型（llm）方面，催化了包括神经外科在内的医学领域的新机会。本研究旨在评估和比较两种先进的llms （chatgpt - omni和gemini）在解决各种神经外科疾病的临床病例询问方面的表现。方法：一项前瞻性观察研究利用500个与神经外科相关的病例为基础的问题，涵盖10种常见疾病。这些问题被设计成模拟现实世界的临床场景，包括诊断、解释和管理，并在两个月后（第二阶段）再次被问及。两名独立的神经外科医生使用6分李克特量表对反应进行评估。结果：ChatGPT-Omni在所有评估指标中表现出一致的优势。在第一阶段，它在所有条件下的总体平均得分为5.38±0.12，在第二阶段增加到5.46±0.08。结论：该研究强调了llm在神经外科中的变革潜力，与Gemini相比，ChatGPT-Omni表现出更高的准确性、相关性和清晰度。随着时间的推移，这两种模型都有所改进，但ChatGPT-Omni在所有临床场景中都表现出色，突出了其在神经外科决策支持和教育方面的潜在效用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

World neurosurgery CLINICAL NEUROLOGY-SURGERY

CiteScore

3.90

自引率

15.00%

发文量

1765

审稿时长

47 days

期刊介绍： World Neurosurgery has an open access mirror journal World Neurosurgery: X, sharing the same aims and scope, editorial team, submission system and rigorous peer review. The journal''s mission is to: -To provide a first-class international forum and a 2-way conduit for dialogue that is relevant to neurosurgeons and providers who care for neurosurgery patients. The categories of the exchanged information include clinical and basic science, as well as global information that provide social, political, educational, economic, cultural or societal insights and knowledge that are of significance and relevance to worldwide neurosurgery patient care. -To act as a primary intellectual catalyst for the stimulation of creativity, the creation of new knowledge, and the enhancement of quality neurosurgical care worldwide. -To provide a forum for communication that enriches the lives of all neurosurgeons and their colleagues; and, in so doing, enriches the lives of their patients. Topics to be addressed in World Neurosurgery include: EDUCATION, ECONOMICS, RESEARCH, POLITICS, HISTORY, CULTURE, CLINICAL SCIENCE, LABORATORY SCIENCE, TECHNOLOGY, OPERATIVE TECHNIQUES, CLINICAL IMAGES, VIDEOS