Subthalamic nucleus or globus pallidus internus deep brain stimulation for the treatment of parkinson’s disease: An artificial intelligence approach

IF 1.9 4区 医学 Q3 CLINICAL NEUROLOGY
David Shin , Timothy Tang , Joel Carson , Rekha Isaac , Chandler Dinh , Daniel Im , Andrew Fay , Asael Isaac , Stephen Cho , Zachary Brandt , Kai Nguyen , Isabel Shaffrey , Vahe Yacoubian , Taha M. Taka , Samantha Spellicy , Miguel Angel Lopez-Gonzalez , Olumide Danisa
{"title":"Subthalamic nucleus or globus pallidus internus deep brain stimulation for the treatment of parkinson’s disease: An artificial intelligence approach","authors":"David Shin ,&nbsp;Timothy Tang ,&nbsp;Joel Carson ,&nbsp;Rekha Isaac ,&nbsp;Chandler Dinh ,&nbsp;Daniel Im ,&nbsp;Andrew Fay ,&nbsp;Asael Isaac ,&nbsp;Stephen Cho ,&nbsp;Zachary Brandt ,&nbsp;Kai Nguyen ,&nbsp;Isabel Shaffrey ,&nbsp;Vahe Yacoubian ,&nbsp;Taha M. Taka ,&nbsp;Samantha Spellicy ,&nbsp;Miguel Angel Lopez-Gonzalez ,&nbsp;Olumide Danisa","doi":"10.1016/j.jocn.2025.111393","DOIUrl":null,"url":null,"abstract":"<div><h3>Background</h3><div>Generative artificial intelligence (AI) in deep brain stimulation (DBS) is currently unvalidated in its content. This study sought to analyze AI responses to questions and recommendations from the 2018 Congress of Neurological Surgeons (CNS) guidelines on subthalamic nucleus and globus pallidus internus DBS for the treatment of patients with Parkinson’s Disease.</div></div><div><h3>Methods</h3><div>Seven questions were generated from CNS guidelines and asked to ChatGPT 4o, Perplexity, Copilot, and Gemini. Answers were “concordant” if they highlighted all points provided by the CNS guidelines; otherwise, answers were considered “non-concordant” and sub-categorized as either “insufficient” or “overconclusive.” AI responses were evaluated for readability via the Flesch-Kincaid Grade Level, Gunning Fog Index, Simple Measure of Gobbledygook (SMOG) Index, and Flesch Reading Ease tests.</div></div><div><h3>Results</h3><div>ChatGPT 4o showcased 42.9% concordance, with non-concordant responses classified as 14.3% insufficient and 42.8% over-conclusive. Perplexity displayed a 28.6% concordance rate, with 14.3% insufficient and 57.1% over-conclusive responses. Copilot showed 28.6% concordance, with 28.6% insufficient and 42.8% over-conclusive responses. Gemini demonstrated 28.6% concordance, with 28.6% insufficient and 42.8% over-conclusive responses. The Flesch-Kincaid Grade Level scores ranged from 14.44 (Gemini) to 18.94 (Copilot), Gunning Fog Index scores varied between 17.9 (Gemini) and 22.06 (Copilot), SMOG Index scores ranged from 16.54 (Gemini) to 19.67 (Copilot), and all Flesch Reading Ease scores were low, with Gemini showing the highest score of 30.91.</div></div><div><h3>Conclusion</h3><div>ChatGPT 4o displayed the most concordance, Perplexity displayed the highest over-conclusive rate, and Copilot and Gemini showcased the most insufficient answers. All responses showcased complex readability. Despite the possible benefits of future developments and innovation in AI capabilities, AI requires further improvement before independent clinical usage in DBS.</div></div>","PeriodicalId":15487,"journal":{"name":"Journal of Clinical Neuroscience","volume":"138 ","pages":"Article 111393"},"PeriodicalIF":1.9000,"publicationDate":"2025-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Clinical Neuroscience","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0967586825003662","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"CLINICAL NEUROLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Background

Generative artificial intelligence (AI) in deep brain stimulation (DBS) is currently unvalidated in its content. This study sought to analyze AI responses to questions and recommendations from the 2018 Congress of Neurological Surgeons (CNS) guidelines on subthalamic nucleus and globus pallidus internus DBS for the treatment of patients with Parkinson’s Disease.

Methods

Seven questions were generated from CNS guidelines and asked to ChatGPT 4o, Perplexity, Copilot, and Gemini. Answers were “concordant” if they highlighted all points provided by the CNS guidelines; otherwise, answers were considered “non-concordant” and sub-categorized as either “insufficient” or “overconclusive.” AI responses were evaluated for readability via the Flesch-Kincaid Grade Level, Gunning Fog Index, Simple Measure of Gobbledygook (SMOG) Index, and Flesch Reading Ease tests.

Results

ChatGPT 4o showcased 42.9% concordance, with non-concordant responses classified as 14.3% insufficient and 42.8% over-conclusive. Perplexity displayed a 28.6% concordance rate, with 14.3% insufficient and 57.1% over-conclusive responses. Copilot showed 28.6% concordance, with 28.6% insufficient and 42.8% over-conclusive responses. Gemini demonstrated 28.6% concordance, with 28.6% insufficient and 42.8% over-conclusive responses. The Flesch-Kincaid Grade Level scores ranged from 14.44 (Gemini) to 18.94 (Copilot), Gunning Fog Index scores varied between 17.9 (Gemini) and 22.06 (Copilot), SMOG Index scores ranged from 16.54 (Gemini) to 19.67 (Copilot), and all Flesch Reading Ease scores were low, with Gemini showing the highest score of 30.91.

Conclusion

ChatGPT 4o displayed the most concordance, Perplexity displayed the highest over-conclusive rate, and Copilot and Gemini showcased the most insufficient answers. All responses showcased complex readability. Despite the possible benefits of future developments and innovation in AI capabilities, AI requires further improvement before independent clinical usage in DBS.
丘脑下核或内苍白球深部脑刺激治疗帕金森病:人工智能方法
深度脑刺激(DBS)中的生成式人工智能(AI)目前在其内容上尚未得到验证。本研究旨在分析人工智能对2018年神经外科医生大会(CNS)关于丘脑下核和内苍白球DBS治疗帕金森病患者指南中提出的问题和建议的反应。方法根据CNS指南生成7个问题,分别向ChatGPT 40、Perplexity、Copilot和Gemini提问。如果答案突出了CNS指南提供的所有要点,则为“一致”;否则,答案被认为是“不一致”,并被分类为“不充分”或“过度结论性”。通过Flesch- kinkaid等级水平、射击雾指数、简单测量官样文章(SMOG)指数和Flesch阅读易用性测试来评估AI回答的可读性。结果gpt 40的一致性为42.9%,不一致性为14.3%,不充分为42.8%,过度为42.8%。困惑显示28.6%的一致性,14.3%的不充分和57.1%的过度结论性反应。副驾驶有28.6%的人回答一致,28.6%的人回答不充分,42.8%的人回答过度。双子座表现出28.6%的一致性,28.6%的不充分和42.8%的过度结论性反应。Flesch- kincaid Grade Level得分在14.44(双子星)至18.94(副驾驶)之间,射击雾指数得分在17.9(双子星)至22.06(副驾驶)之间,烟雾指数得分在16.54(双子星)至19.67(副驾驶)之间,Flesch Reading Ease得分均较低,其中双子星得分最高,为30.91。结论chatgpt 40的一致性最高,Perplexity的过度结论性最高,Copilot和Gemini的回答不充分性最高。所有的回复都显示了复杂的可读性。尽管人工智能能力的未来发展和创新可能带来好处,但在DBS中独立临床应用之前,人工智能需要进一步改进。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Journal of Clinical Neuroscience
Journal of Clinical Neuroscience 医学-临床神经学
CiteScore
4.50
自引率
0.00%
发文量
402
审稿时长
40 days
期刊介绍: This International journal, Journal of Clinical Neuroscience, publishes articles on clinical neurosurgery and neurology and the related neurosciences such as neuro-pathology, neuro-radiology, neuro-ophthalmology and neuro-physiology. The journal has a broad International perspective, and emphasises the advances occurring in Asia, the Pacific Rim region, Europe and North America. The Journal acts as a focus for publication of major clinical and laboratory research, as well as publishing solicited manuscripts on specific subjects from experts, case reports and other information of interest to clinicians working in the clinical neurosciences.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信