{"title":"用大型语言模型转化神经外科实践:ChatGPT-omni和Gemini在复杂病例管理中的比较表现。","authors":"Barış Çöllüoğlu, Şamil Dikici","doi":"10.23736/S0390-5616.25.06447-1","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Recent advancements in artificial intelligence, particularly in large language models (LLMs), have catalyzed new opportunities within medical domains, including neurosurgery. This study aims to evaluate and compare the performance of two advanced LLMs - ChatGPT-Omni and Gemini -in addressing clinical case inquiries on various neurosurgical conditions.</p><p><strong>Methods: </strong>A prospective observational study was conducted utilizing 500 case-based questions relevant to neurosurgery, covering 10 prevalent conditions. The questions were designed to simulate real-world clinical scenarios encompassing diagnosis, interpretation, and management and were asked again two months (phase 2) later. Responses were evaluated using a 6-point Likert scale by two independent neurosurgeons.</p><p><strong>Results: </strong>ChatGPT-Omni exhibited consistent superiority across all evaluation metrics. In Phase 1, its overall average score across all conditions was 5.38±0.12, which increased to 5.46±0.08 in Phase 2 (P<0.001). While exhibiting moderate improvements, Gemini trailed behind ChatGPT-Omni with an overall average score of 4.93±0.15 in Phase 1, which improved to 5.1±0.14 in Phase 2 (P<0.001). Subgroup analyses indicated that ChatGPT-Omni provided superior contextual accuracy across all conditions (P<0.001).</p><p><strong>Conclusions: </strong>The study underscores the transformative potential of LLMs in neurosurgery, with ChatGPT-Omni demonstrating superior accuracy, relevance, and clarity compared to Gemini. While both models improved over time, ChatGPT-Omni consistently excelled across all clinical scenarios, highlighting its potential utility in neurosurgical decision support and education.</p>","PeriodicalId":16504,"journal":{"name":"Journal of neurosurgical sciences","volume":" ","pages":""},"PeriodicalIF":1.3000,"publicationDate":"2025-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Transforming neurosurgical practice with large language models: comparative performance of ChatGPT-omni and Gemini in complex case management.\",\"authors\":\"Barış Çöllüoğlu, Şamil Dikici\",\"doi\":\"10.23736/S0390-5616.25.06447-1\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>Recent advancements in artificial intelligence, particularly in large language models (LLMs), have catalyzed new opportunities within medical domains, including neurosurgery. This study aims to evaluate and compare the performance of two advanced LLMs - ChatGPT-Omni and Gemini -in addressing clinical case inquiries on various neurosurgical conditions.</p><p><strong>Methods: </strong>A prospective observational study was conducted utilizing 500 case-based questions relevant to neurosurgery, covering 10 prevalent conditions. The questions were designed to simulate real-world clinical scenarios encompassing diagnosis, interpretation, and management and were asked again two months (phase 2) later. Responses were evaluated using a 6-point Likert scale by two independent neurosurgeons.</p><p><strong>Results: </strong>ChatGPT-Omni exhibited consistent superiority across all evaluation metrics. In Phase 1, its overall average score across all conditions was 5.38±0.12, which increased to 5.46±0.08 in Phase 2 (P<0.001). While exhibiting moderate improvements, Gemini trailed behind ChatGPT-Omni with an overall average score of 4.93±0.15 in Phase 1, which improved to 5.1±0.14 in Phase 2 (P<0.001). Subgroup analyses indicated that ChatGPT-Omni provided superior contextual accuracy across all conditions (P<0.001).</p><p><strong>Conclusions: </strong>The study underscores the transformative potential of LLMs in neurosurgery, with ChatGPT-Omni demonstrating superior accuracy, relevance, and clarity compared to Gemini. While both models improved over time, ChatGPT-Omni consistently excelled across all clinical scenarios, highlighting its potential utility in neurosurgical decision support and education.</p>\",\"PeriodicalId\":16504,\"journal\":{\"name\":\"Journal of neurosurgical sciences\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":1.3000,\"publicationDate\":\"2025-06-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of neurosurgical sciences\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.23736/S0390-5616.25.06447-1\",\"RegionNum\":4,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"CLINICAL NEUROLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of neurosurgical sciences","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.23736/S0390-5616.25.06447-1","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"CLINICAL NEUROLOGY","Score":null,"Total":0}
Transforming neurosurgical practice with large language models: comparative performance of ChatGPT-omni and Gemini in complex case management.
Background: Recent advancements in artificial intelligence, particularly in large language models (LLMs), have catalyzed new opportunities within medical domains, including neurosurgery. This study aims to evaluate and compare the performance of two advanced LLMs - ChatGPT-Omni and Gemini -in addressing clinical case inquiries on various neurosurgical conditions.
Methods: A prospective observational study was conducted utilizing 500 case-based questions relevant to neurosurgery, covering 10 prevalent conditions. The questions were designed to simulate real-world clinical scenarios encompassing diagnosis, interpretation, and management and were asked again two months (phase 2) later. Responses were evaluated using a 6-point Likert scale by two independent neurosurgeons.
Results: ChatGPT-Omni exhibited consistent superiority across all evaluation metrics. In Phase 1, its overall average score across all conditions was 5.38±0.12, which increased to 5.46±0.08 in Phase 2 (P<0.001). While exhibiting moderate improvements, Gemini trailed behind ChatGPT-Omni with an overall average score of 4.93±0.15 in Phase 1, which improved to 5.1±0.14 in Phase 2 (P<0.001). Subgroup analyses indicated that ChatGPT-Omni provided superior contextual accuracy across all conditions (P<0.001).
Conclusions: The study underscores the transformative potential of LLMs in neurosurgery, with ChatGPT-Omni demonstrating superior accuracy, relevance, and clarity compared to Gemini. While both models improved over time, ChatGPT-Omni consistently excelled across all clinical scenarios, highlighting its potential utility in neurosurgical decision support and education.
期刊介绍:
The Journal of Neurosurgical Sciences publishes scientific papers on neurosurgery and related subjects (electroencephalography, neurophysiology, neurochemistry, neuropathology, stereotaxy, neuroanatomy, neuroradiology, etc.). Manuscripts may be submitted in the form of ditorials, original articles, review articles, special articles, letters to the Editor and guidelines. The journal aims to provide its readers with papers of the highest quality and impact through a process of careful peer review and editorial work.