Use of ChatGPT for Determining Clinical and Surgical Treatment of Lumbar Disc Herniation With Radiculopathy: A North American Spine Society Guideline Comparison.

IF 3.8 2区医学 Q1 CLINICAL NEUROLOGY

Neurospine Pub Date : 2024-03-01 Epub Date: 2024-01-31 DOI:10.14245/ns.2347052.526

Mateo Restrepo Mejia, Juan Sebastian Arroyave, Michael Saturno, Laura Chelsea Mazudie Ndjonko, Bashar Zaidat, Rami Rajjoub, Wasil Ahmed, Ivan Zapolsky, Samuel K Cho

{"title":"Use of ChatGPT for Determining Clinical and Surgical Treatment of Lumbar Disc Herniation With Radiculopathy: A North American Spine Society Guideline Comparison.","authors":"Mateo Restrepo Mejia, Juan Sebastian Arroyave, Michael Saturno, Laura Chelsea Mazudie Ndjonko, Bashar Zaidat, Rami Rajjoub, Wasil Ahmed, Ivan Zapolsky, Samuel K Cho","doi":"10.14245/ns.2347052.526","DOIUrl":null,"url":null,"abstract":"Objective: Large language models like chat generative pre-trained transformer (ChatGPT) have found success in various sectors, but their application in the medical field remains limited. This study aimed to assess the feasibility of using ChatGPT to provide accurate medical information to patients, specifically evaluating how well ChatGPT versions 3.5 and 4 aligned with the 2012 North American Spine Society (NASS) guidelines for lumbar disk herniation with radiculopathy.Methods: ChatGPT's responses to questions based on the NASS guidelines were analyzed for accuracy. Three new categories-overconclusiveness, supplementary information, and incompleteness-were introduced to deepen the analysis. Overconclusiveness referred to recommendations not mentioned in the NASS guidelines, supplementary information denoted additional relevant details, and incompleteness indicated omitted crucial information from the NASS guidelines.Results: Out of 29 clinical guidelines evaluated, ChatGPT-3.5 demonstrated accuracy in 15 responses (52%), while ChatGPT-4 achieved accuracy in 17 responses (59%). ChatGPT-3.5 was overconclusive in 14 responses (48%), while ChatGPT-4 exhibited overconclusiveness in 13 responses (45%). Additionally, ChatGPT-3.5 provided supplementary information in 24 responses (83%), and ChatGPT-4 provided supplemental information in 27 responses (93%). In terms of incompleteness, ChatGPT-3.5 displayed this in 11 responses (38%), while ChatGPT-4 showed incompleteness in 8 responses (23%).Conclusion: ChatGPT shows promise for clinical decision-making, but both patients and healthcare providers should exercise caution to ensure safety and quality of care. While these results are encouraging, further research is necessary to validate the use of large language models in clinical settings.","PeriodicalId":19269,"journal":{"name":"Neurospine","volume":" ","pages":"149-158"},"PeriodicalIF":3.8000,"publicationDate":"2024-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10992643/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neurospine","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.14245/ns.2347052.526","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/1/31 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"CLINICAL NEUROLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Objective: Large language models like chat generative pre-trained transformer (ChatGPT) have found success in various sectors, but their application in the medical field remains limited. This study aimed to assess the feasibility of using ChatGPT to provide accurate medical information to patients, specifically evaluating how well ChatGPT versions 3.5 and 4 aligned with the 2012 North American Spine Society (NASS) guidelines for lumbar disk herniation with radiculopathy.

Methods: ChatGPT's responses to questions based on the NASS guidelines were analyzed for accuracy. Three new categories-overconclusiveness, supplementary information, and incompleteness-were introduced to deepen the analysis. Overconclusiveness referred to recommendations not mentioned in the NASS guidelines, supplementary information denoted additional relevant details, and incompleteness indicated omitted crucial information from the NASS guidelines.

Results: Out of 29 clinical guidelines evaluated, ChatGPT-3.5 demonstrated accuracy in 15 responses (52%), while ChatGPT-4 achieved accuracy in 17 responses (59%). ChatGPT-3.5 was overconclusive in 14 responses (48%), while ChatGPT-4 exhibited overconclusiveness in 13 responses (45%). Additionally, ChatGPT-3.5 provided supplementary information in 24 responses (83%), and ChatGPT-4 provided supplemental information in 27 responses (93%). In terms of incompleteness, ChatGPT-3.5 displayed this in 11 responses (38%), while ChatGPT-4 showed incompleteness in 8 responses (23%).

Conclusion: ChatGPT shows promise for clinical decision-making, but both patients and healthcare providers should exercise caution to ensure safety and quality of care. While these results are encouraging, further research is necessary to validate the use of large language models in clinical settings.

查看原文本刊更多论文

使用 ChatGPT 确定腰椎间盘突出症伴根性病变的临床和手术治疗：NASS 指南比较。

背景：像 ChatGPT 这样的大型语言模型在各行各业都取得了成功，但在医疗领域的应用仍然有限。本研究旨在评估使用 ChatGPT 向患者提供准确医疗信息的可行性，特别是评估 ChatGPT 3.5 和 4.0 版本与 2012 年北美脊柱协会（NASS）腰椎间盘突出症伴有放射性病变指南的一致性：对 ChatGPT 根据 NASS 指南回答问题的准确性进行了分析。为了深化分析，引入了三个新的类别--过度结论、补充信息和不完整。过度结论性指的是NASS指南中未提及的建议，补充信息指的是更多相关细节，不完整性指的是NASS指南中遗漏的关键信息：在所评估的 29 份临床指南中，ChatGPT 3.5 在 15 份（52%）回复中显示了准确性，而 ChatGPT 4.0 在 17 份（59%）回复中实现了准确性。ChatGPT 3.5 在 14 个（48%）回复中显示出过度结论，而 ChatGPT 4.0 在 13 个（45%）回复中显示出过度结论。此外，ChatGPT 3.5 在 24 个（83%）回复中提供了补充信息，而 ChatGPT 4.0 在 27 个（93%）回复中提供了补充信息。就不完整性而言，ChatGPT 3.5 在 11 个（38%）回复中显示出不完整性，而 ChatGPT 4.0 在 8 个（23%）回复中显示出不完整性：结论：ChatGPT 为临床决策带来了希望，但患者和医疗服务提供者都应谨慎行事，以确保医疗安全和质量。虽然这些结果令人鼓舞，但仍需进一步研究，以验证大型语言模型在临床环境中的应用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Neurospine Multiple-

CiteScore

5.80

自引率

18.80%

发文量

审稿时长

10 weeks