Elie Najjar, Ahmed Abdelazim Hassan, Rodrigo Muscogliati, Khalid M Salem, Nasir A Quraishi
{"title":"Human versus machine: deciding on high-stakes surgery in possible Cauda Equina syndrome.","authors":"Elie Najjar, Ahmed Abdelazim Hassan, Rodrigo Muscogliati, Khalid M Salem, Nasir A Quraishi","doi":"10.1016/j.spinee.2025.05.026","DOIUrl":null,"url":null,"abstract":"<p><strong>Background context: </strong>Cauda Equina Syndrome (CES) is a spine surgical urgency requiring prompt intervention to prevent neurological deficits. Accurate identification of CES cases needing urgent surgery is essential to avoid long-term sequelae.</p><p><strong>Purpose: </strong>To evaluate the concordance between an AI language model (ChatGPT) and a Spinal Multidisciplinary Team (MDT) in recommending surgical intervention for suspected CES cases.</p><p><strong>Study design/setting: </strong>Retrospective concordance analysis comparing surgical recommendations between ChatGPT and a Spinal MDT.</p><p><strong>Patient sample: </strong>Among 160 referrals presenting with red flags for possible CES, 10 cases were used to calibrate ChatGPT to specific clinical and diagnostic parameters, with the remaining 150 cases included in the primary analysis. The average patient age was 50.6 years (range 18-87), with a male-to-female ratio of 68:82.</p><p><strong>Outcome measures: </strong>The primary outcome was the concordance rate between ChatGPT and the MDT in recommending surgery, evaluated through agreement rates and statistical analysis.</p><p><strong>Methods: </strong>Each of the 150 cases was presented as standardized slides including clinical history, imaging, and examination findings. Both the MDT and ChatGPT assessed the need for urgent surgery. Discordant cases (n=17) were further reviewed by 3 spinal surgeons blinded to prior decisions.</p><p><strong>Results: </strong>ChatGPT and the MDT agreed on surgical recommendations in 133 out of 150 cases, achieving an 88.7% concordance (Cohen's Kappa = 0.764, p<.001). ChatGPT recommended surgery more frequently in the 17 discordant cases, but this difference was not statistically significant (McNemar's test statistic = 1.23, p=.46). Review by 3 independent surgeons reached consensus on 11 of the 17 discordant cases (64.7%), highlighting variability among experts; individual surgeons aligned with ChatGPT in 5 to 6 cases each (29.4%-35.3%).</p><p><strong>Conclusions: </strong>Substantial agreement between ChatGPT and the MDT suggests ChatGPT's comparable sensitivity in detecting surgical candidates in CES cases. Variability among surgeons on discordant cases underscores subjectivity in CES triage. ChatGPT may be a valuable adjunct in high-stakes clinical decision-making, though further validation and refinement are needed.</p>","PeriodicalId":49484,"journal":{"name":"Spine Journal","volume":" ","pages":""},"PeriodicalIF":4.7000,"publicationDate":"2025-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Spine Journal","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1016/j.spinee.2025.05.026","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CLINICAL NEUROLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Background context: Cauda Equina Syndrome (CES) is a spine surgical urgency requiring prompt intervention to prevent neurological deficits. Accurate identification of CES cases needing urgent surgery is essential to avoid long-term sequelae.
Purpose: To evaluate the concordance between an AI language model (ChatGPT) and a Spinal Multidisciplinary Team (MDT) in recommending surgical intervention for suspected CES cases.
Study design/setting: Retrospective concordance analysis comparing surgical recommendations between ChatGPT and a Spinal MDT.
Patient sample: Among 160 referrals presenting with red flags for possible CES, 10 cases were used to calibrate ChatGPT to specific clinical and diagnostic parameters, with the remaining 150 cases included in the primary analysis. The average patient age was 50.6 years (range 18-87), with a male-to-female ratio of 68:82.
Outcome measures: The primary outcome was the concordance rate between ChatGPT and the MDT in recommending surgery, evaluated through agreement rates and statistical analysis.
Methods: Each of the 150 cases was presented as standardized slides including clinical history, imaging, and examination findings. Both the MDT and ChatGPT assessed the need for urgent surgery. Discordant cases (n=17) were further reviewed by 3 spinal surgeons blinded to prior decisions.
Results: ChatGPT and the MDT agreed on surgical recommendations in 133 out of 150 cases, achieving an 88.7% concordance (Cohen's Kappa = 0.764, p<.001). ChatGPT recommended surgery more frequently in the 17 discordant cases, but this difference was not statistically significant (McNemar's test statistic = 1.23, p=.46). Review by 3 independent surgeons reached consensus on 11 of the 17 discordant cases (64.7%), highlighting variability among experts; individual surgeons aligned with ChatGPT in 5 to 6 cases each (29.4%-35.3%).
Conclusions: Substantial agreement between ChatGPT and the MDT suggests ChatGPT's comparable sensitivity in detecting surgical candidates in CES cases. Variability among surgeons on discordant cases underscores subjectivity in CES triage. ChatGPT may be a valuable adjunct in high-stakes clinical decision-making, though further validation and refinement are needed.
期刊介绍:
The Spine Journal, the official journal of the North American Spine Society, is an international and multidisciplinary journal that publishes original, peer-reviewed articles on research and treatment related to the spine and spine care, including basic science and clinical investigations. It is a condition of publication that manuscripts submitted to The Spine Journal have not been published, and will not be simultaneously submitted or published elsewhere. The Spine Journal also publishes major reviews of specific topics by acknowledged authorities, technical notes, teaching editorials, and other special features, Letters to the Editor-in-Chief are encouraged.