Human versus machine: deciding on high-stakes surgery in possible Cauda Equina syndrome.

IF 4.7 1区 医学 Q1 CLINICAL NEUROLOGY
Elie Najjar, Ahmed Abdelazim Hassan, Rodrigo Muscogliati, Khalid M Salem, Nasir A Quraishi
{"title":"Human versus machine: deciding on high-stakes surgery in possible Cauda Equina syndrome.","authors":"Elie Najjar, Ahmed Abdelazim Hassan, Rodrigo Muscogliati, Khalid M Salem, Nasir A Quraishi","doi":"10.1016/j.spinee.2025.05.026","DOIUrl":null,"url":null,"abstract":"<p><strong>Background context: </strong>Cauda Equina Syndrome (CES) is a spine surgical urgency requiring prompt intervention to prevent neurological deficits. Accurate identification of CES cases needing urgent surgery is essential to avoid long-term sequelae.</p><p><strong>Purpose: </strong>To evaluate the concordance between an AI language model (ChatGPT) and a Spinal Multidisciplinary Team (MDT) in recommending surgical intervention for suspected CES cases.</p><p><strong>Study design/setting: </strong>Retrospective concordance analysis comparing surgical recommendations between ChatGPT and a Spinal MDT.</p><p><strong>Patient sample: </strong>Among 160 referrals presenting with red flags for possible CES, 10 cases were used to calibrate ChatGPT to specific clinical and diagnostic parameters, with the remaining 150 cases included in the primary analysis. The average patient age was 50.6 years (range 18-87), with a male-to-female ratio of 68:82.</p><p><strong>Outcome measures: </strong>The primary outcome was the concordance rate between ChatGPT and the MDT in recommending surgery, evaluated through agreement rates and statistical analysis.</p><p><strong>Methods: </strong>Each of the 150 cases was presented as standardized slides including clinical history, imaging, and examination findings. Both the MDT and ChatGPT assessed the need for urgent surgery. Discordant cases (n=17) were further reviewed by 3 spinal surgeons blinded to prior decisions.</p><p><strong>Results: </strong>ChatGPT and the MDT agreed on surgical recommendations in 133 out of 150 cases, achieving an 88.7% concordance (Cohen's Kappa = 0.764, p<.001). ChatGPT recommended surgery more frequently in the 17 discordant cases, but this difference was not statistically significant (McNemar's test statistic = 1.23, p=.46). Review by 3 independent surgeons reached consensus on 11 of the 17 discordant cases (64.7%), highlighting variability among experts; individual surgeons aligned with ChatGPT in 5 to 6 cases each (29.4%-35.3%).</p><p><strong>Conclusions: </strong>Substantial agreement between ChatGPT and the MDT suggests ChatGPT's comparable sensitivity in detecting surgical candidates in CES cases. Variability among surgeons on discordant cases underscores subjectivity in CES triage. ChatGPT may be a valuable adjunct in high-stakes clinical decision-making, though further validation and refinement are needed.</p>","PeriodicalId":49484,"journal":{"name":"Spine Journal","volume":" ","pages":""},"PeriodicalIF":4.7000,"publicationDate":"2025-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Spine Journal","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1016/j.spinee.2025.05.026","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CLINICAL NEUROLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Background context: Cauda Equina Syndrome (CES) is a spine surgical urgency requiring prompt intervention to prevent neurological deficits. Accurate identification of CES cases needing urgent surgery is essential to avoid long-term sequelae.

Purpose: To evaluate the concordance between an AI language model (ChatGPT) and a Spinal Multidisciplinary Team (MDT) in recommending surgical intervention for suspected CES cases.

Study design/setting: Retrospective concordance analysis comparing surgical recommendations between ChatGPT and a Spinal MDT.

Patient sample: Among 160 referrals presenting with red flags for possible CES, 10 cases were used to calibrate ChatGPT to specific clinical and diagnostic parameters, with the remaining 150 cases included in the primary analysis. The average patient age was 50.6 years (range 18-87), with a male-to-female ratio of 68:82.

Outcome measures: The primary outcome was the concordance rate between ChatGPT and the MDT in recommending surgery, evaluated through agreement rates and statistical analysis.

Methods: Each of the 150 cases was presented as standardized slides including clinical history, imaging, and examination findings. Both the MDT and ChatGPT assessed the need for urgent surgery. Discordant cases (n=17) were further reviewed by 3 spinal surgeons blinded to prior decisions.

Results: ChatGPT and the MDT agreed on surgical recommendations in 133 out of 150 cases, achieving an 88.7% concordance (Cohen's Kappa = 0.764, p<.001). ChatGPT recommended surgery more frequently in the 17 discordant cases, but this difference was not statistically significant (McNemar's test statistic = 1.23, p=.46). Review by 3 independent surgeons reached consensus on 11 of the 17 discordant cases (64.7%), highlighting variability among experts; individual surgeons aligned with ChatGPT in 5 to 6 cases each (29.4%-35.3%).

Conclusions: Substantial agreement between ChatGPT and the MDT suggests ChatGPT's comparable sensitivity in detecting surgical candidates in CES cases. Variability among surgeons on discordant cases underscores subjectivity in CES triage. ChatGPT may be a valuable adjunct in high-stakes clinical decision-making, though further validation and refinement are needed.

人与机器:决定高风险手术治疗可能的马尾综合征。
背景背景:马尾综合征(CES)是一种脊柱外科急症,需要及时干预以预防神经功能缺损。准确识别需要紧急手术的CES病例对于避免长期后遗症至关重要。目的:评估人工智能语言模型(ChatGPT)与脊柱多学科团队(MDT)在推荐疑似CES病例手术干预方面的一致性。研究设计/设置:回顾性一致性分析,比较ChatGPT和脊柱MDT的手术建议。患者样本:在160例可能出现CES危险信号的转诊患者中,10例用于校准ChatGPT以达到特定的临床和诊断参数,其余150例纳入初步分析。患者平均年龄为50.6岁(18-87岁),男女比例为68:82。结局指标:主要结局是ChatGPT与MDT在推荐手术方面的一致性,通过一致性率和统计分析进行评估。方法:对150例患者进行标准化切片,包括临床病史、影像学和检查结果。MDT和ChatGPT都评估了是否需要紧急手术。不一致的病例(n = 17)由三名不知情的脊柱外科医生进一步审查。结果:在150例患者中,ChatGPT与MDT的手术建议一致133例,一致性达到88.7% (Cohen’s Kappa = 0.764,P < 0.001)。在17例不一致的病例中,ChatGPT推荐手术的频率更高,但差异无统计学意义(McNemar检验统计量 = 1.23,p = 0.46)。三位独立的外科医生对17例不一致病例中的11例(64.7%)达成共识,突出了专家之间的差异;个别外科医生与ChatGPT一致的病例各5-6例(29.4-35.3%)。结论:ChatGPT和MDT之间的实质性一致表明,ChatGPT在检测CES病例的手术候选人方面具有相当的敏感性。外科医生对不一致病例的差异强调了CES分诊的主观性。ChatGPT可能是高风险临床决策的一个有价值的辅助手段,尽管需要进一步的验证和完善。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Spine Journal
Spine Journal 医学-临床神经学
CiteScore
8.20
自引率
6.70%
发文量
680
审稿时长
13.1 weeks
期刊介绍: The Spine Journal, the official journal of the North American Spine Society, is an international and multidisciplinary journal that publishes original, peer-reviewed articles on research and treatment related to the spine and spine care, including basic science and clinical investigations. It is a condition of publication that manuscripts submitted to The Spine Journal have not been published, and will not be simultaneously submitted or published elsewhere. The Spine Journal also publishes major reviews of specific topics by acknowledged authorities, technical notes, teaching editorials, and other special features, Letters to the Editor-in-Chief are encouraged.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信