{"title":"Transforming Neurosurgical Practice with Large Language Models: Comparative Performance of ChatGPT-Omni and Gemini in Complex Case Management.","authors":"Baris Colluoglu, Samil Dikici","doi":"10.1016/j.wneu.2025.124103","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Recent advancements in artificial intelligence, particularly in large language models (LLMs), have catalyzed new opportunities within medical domains, including neurosurgery. This study aims to evaluate and compare the performance of two advanced LLMs-ChatGPT-Omni and Gemini-in addressing clinical case inquiries on various neurosurgical conditions.</p><p><strong>Methods: </strong>A prospective observational study was conducted utilizing 500 case-based questions relevant to neurosurgery, covering 10 prevalent conditions. The questions were designed to simulate real-world clinical scenarios encompassing diagnosis, interpretation, and management and were asked again two months (phase 2) later. Responses were evaluated using a 6-point Likert scale by two independent neurosurgeons.</p><p><strong>Results: </strong>ChatGPT-Omni exhibited consistent superiority across all evaluation metrics. In Phase 1, its overall average score across all conditions was 5.38±0.12, which increased to 5.46±0.08 in Phase 2 (p<0.001). While exhibiting moderate improvements, Gemini trailed behind ChatGPT-Omni with an overall average score of 4.93±0.15 in Phase 1, which improved to 5.1±0.14 in Phase 2 (p<0.001). Subgroup analyses indicated that ChatGPT-Omni provided superior contextual accuracy across all conditions (p<0.001).</p><p><strong>Conclusion: </strong>The study underscores the transformative potential of LLMs in neurosurgery, with ChatGPT-Omni demonstrating superior accuracy, relevance, and clarity compared to Gemini. While both models improved over time, ChatGPT-Omni consistently excelled across all clinical scenarios, highlighting its potential utility in neurosurgical decision support and education.</p>","PeriodicalId":23906,"journal":{"name":"World neurosurgery","volume":" ","pages":"124103"},"PeriodicalIF":1.9000,"publicationDate":"2025-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"World neurosurgery","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1016/j.wneu.2025.124103","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"CLINICAL NEUROLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Background: Recent advancements in artificial intelligence, particularly in large language models (LLMs), have catalyzed new opportunities within medical domains, including neurosurgery. This study aims to evaluate and compare the performance of two advanced LLMs-ChatGPT-Omni and Gemini-in addressing clinical case inquiries on various neurosurgical conditions.
Methods: A prospective observational study was conducted utilizing 500 case-based questions relevant to neurosurgery, covering 10 prevalent conditions. The questions were designed to simulate real-world clinical scenarios encompassing diagnosis, interpretation, and management and were asked again two months (phase 2) later. Responses were evaluated using a 6-point Likert scale by two independent neurosurgeons.
Results: ChatGPT-Omni exhibited consistent superiority across all evaluation metrics. In Phase 1, its overall average score across all conditions was 5.38±0.12, which increased to 5.46±0.08 in Phase 2 (p<0.001). While exhibiting moderate improvements, Gemini trailed behind ChatGPT-Omni with an overall average score of 4.93±0.15 in Phase 1, which improved to 5.1±0.14 in Phase 2 (p<0.001). Subgroup analyses indicated that ChatGPT-Omni provided superior contextual accuracy across all conditions (p<0.001).
Conclusion: The study underscores the transformative potential of LLMs in neurosurgery, with ChatGPT-Omni demonstrating superior accuracy, relevance, and clarity compared to Gemini. While both models improved over time, ChatGPT-Omni consistently excelled across all clinical scenarios, highlighting its potential utility in neurosurgical decision support and education.
期刊介绍:
World Neurosurgery has an open access mirror journal World Neurosurgery: X, sharing the same aims and scope, editorial team, submission system and rigorous peer review.
The journal''s mission is to:
-To provide a first-class international forum and a 2-way conduit for dialogue that is relevant to neurosurgeons and providers who care for neurosurgery patients. The categories of the exchanged information include clinical and basic science, as well as global information that provide social, political, educational, economic, cultural or societal insights and knowledge that are of significance and relevance to worldwide neurosurgery patient care.
-To act as a primary intellectual catalyst for the stimulation of creativity, the creation of new knowledge, and the enhancement of quality neurosurgical care worldwide.
-To provide a forum for communication that enriches the lives of all neurosurgeons and their colleagues; and, in so doing, enriches the lives of their patients.
Topics to be addressed in World Neurosurgery include: EDUCATION, ECONOMICS, RESEARCH, POLITICS, HISTORY, CULTURE, CLINICAL SCIENCE, LABORATORY SCIENCE, TECHNOLOGY, OPERATIVE TECHNIQUES, CLINICAL IMAGES, VIDEOS