David Shin , Timothy Tang , Joel Carson , Rekha Isaac , Chandler Dinh , Daniel Im , Andrew Fay , Asael Isaac , Stephen Cho , Zachary Brandt , Kai Nguyen , Isabel Shaffrey , Vahe Yacoubian , Taha M. Taka , Samantha Spellicy , Miguel Angel Lopez-Gonzalez , Olumide Danisa
{"title":"Subthalamic nucleus or globus pallidus internus deep brain stimulation for the treatment of parkinson’s disease: An artificial intelligence approach","authors":"David Shin , Timothy Tang , Joel Carson , Rekha Isaac , Chandler Dinh , Daniel Im , Andrew Fay , Asael Isaac , Stephen Cho , Zachary Brandt , Kai Nguyen , Isabel Shaffrey , Vahe Yacoubian , Taha M. Taka , Samantha Spellicy , Miguel Angel Lopez-Gonzalez , Olumide Danisa","doi":"10.1016/j.jocn.2025.111393","DOIUrl":null,"url":null,"abstract":"<div><h3>Background</h3><div>Generative artificial intelligence (AI) in deep brain stimulation (DBS) is currently unvalidated in its content. This study sought to analyze AI responses to questions and recommendations from the 2018 Congress of Neurological Surgeons (CNS) guidelines on subthalamic nucleus and globus pallidus internus DBS for the treatment of patients with Parkinson’s Disease.</div></div><div><h3>Methods</h3><div>Seven questions were generated from CNS guidelines and asked to ChatGPT 4o, Perplexity, Copilot, and Gemini. Answers were “concordant” if they highlighted all points provided by the CNS guidelines; otherwise, answers were considered “non-concordant” and sub-categorized as either “insufficient” or “overconclusive.” AI responses were evaluated for readability via the Flesch-Kincaid Grade Level, Gunning Fog Index, Simple Measure of Gobbledygook (SMOG) Index, and Flesch Reading Ease tests.</div></div><div><h3>Results</h3><div>ChatGPT 4o showcased 42.9% concordance, with non-concordant responses classified as 14.3% insufficient and 42.8% over-conclusive. Perplexity displayed a 28.6% concordance rate, with 14.3% insufficient and 57.1% over-conclusive responses. Copilot showed 28.6% concordance, with 28.6% insufficient and 42.8% over-conclusive responses. Gemini demonstrated 28.6% concordance, with 28.6% insufficient and 42.8% over-conclusive responses. The Flesch-Kincaid Grade Level scores ranged from 14.44 (Gemini) to 18.94 (Copilot), Gunning Fog Index scores varied between 17.9 (Gemini) and 22.06 (Copilot), SMOG Index scores ranged from 16.54 (Gemini) to 19.67 (Copilot), and all Flesch Reading Ease scores were low, with Gemini showing the highest score of 30.91.</div></div><div><h3>Conclusion</h3><div>ChatGPT 4o displayed the most concordance, Perplexity displayed the highest over-conclusive rate, and Copilot and Gemini showcased the most insufficient answers. All responses showcased complex readability. Despite the possible benefits of future developments and innovation in AI capabilities, AI requires further improvement before independent clinical usage in DBS.</div></div>","PeriodicalId":15487,"journal":{"name":"Journal of Clinical Neuroscience","volume":"138 ","pages":"Article 111393"},"PeriodicalIF":1.9000,"publicationDate":"2025-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Clinical Neuroscience","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0967586825003662","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"CLINICAL NEUROLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Background
Generative artificial intelligence (AI) in deep brain stimulation (DBS) is currently unvalidated in its content. This study sought to analyze AI responses to questions and recommendations from the 2018 Congress of Neurological Surgeons (CNS) guidelines on subthalamic nucleus and globus pallidus internus DBS for the treatment of patients with Parkinson’s Disease.
Methods
Seven questions were generated from CNS guidelines and asked to ChatGPT 4o, Perplexity, Copilot, and Gemini. Answers were “concordant” if they highlighted all points provided by the CNS guidelines; otherwise, answers were considered “non-concordant” and sub-categorized as either “insufficient” or “overconclusive.” AI responses were evaluated for readability via the Flesch-Kincaid Grade Level, Gunning Fog Index, Simple Measure of Gobbledygook (SMOG) Index, and Flesch Reading Ease tests.
Results
ChatGPT 4o showcased 42.9% concordance, with non-concordant responses classified as 14.3% insufficient and 42.8% over-conclusive. Perplexity displayed a 28.6% concordance rate, with 14.3% insufficient and 57.1% over-conclusive responses. Copilot showed 28.6% concordance, with 28.6% insufficient and 42.8% over-conclusive responses. Gemini demonstrated 28.6% concordance, with 28.6% insufficient and 42.8% over-conclusive responses. The Flesch-Kincaid Grade Level scores ranged from 14.44 (Gemini) to 18.94 (Copilot), Gunning Fog Index scores varied between 17.9 (Gemini) and 22.06 (Copilot), SMOG Index scores ranged from 16.54 (Gemini) to 19.67 (Copilot), and all Flesch Reading Ease scores were low, with Gemini showing the highest score of 30.91.
Conclusion
ChatGPT 4o displayed the most concordance, Perplexity displayed the highest over-conclusive rate, and Copilot and Gemini showcased the most insufficient answers. All responses showcased complex readability. Despite the possible benefits of future developments and innovation in AI capabilities, AI requires further improvement before independent clinical usage in DBS.
期刊介绍:
This International journal, Journal of Clinical Neuroscience, publishes articles on clinical neurosurgery and neurology and the related neurosciences such as neuro-pathology, neuro-radiology, neuro-ophthalmology and neuro-physiology.
The journal has a broad International perspective, and emphasises the advances occurring in Asia, the Pacific Rim region, Europe and North America. The Journal acts as a focus for publication of major clinical and laboratory research, as well as publishing solicited manuscripts on specific subjects from experts, case reports and other information of interest to clinicians working in the clinical neurosciences.