Evaluation of retrieval-augmented generation and large language models in clinical guidelines for degenerative spine conditions.

IF 2.6 3区 医学 Q2 CLINICAL NEUROLOGY
Audrey Y Su, Ashley Knebel, Andrew Y Xu, Marco Kaper, Phillip Schmitt, Joseph E Nassar, Manjot Singh, Michael J Farias, Jinho Kim, Bassel G Diebo, Alan H Daniels
{"title":"Evaluation of retrieval-augmented generation and large language models in clinical guidelines for degenerative spine conditions.","authors":"Audrey Y Su, Ashley Knebel, Andrew Y Xu, Marco Kaper, Phillip Schmitt, Joseph E Nassar, Manjot Singh, Michael J Farias, Jinho Kim, Bassel G Diebo, Alan H Daniels","doi":"10.1007/s00586-025-08994-8","DOIUrl":null,"url":null,"abstract":"<p><strong>Purpose: </strong>Degenerative spinal diseases often require complex, patient-specific treatment, presenting a compelling challenge for artificial intelligence (AI) integration into clinical practice. While existing literature has focused on ChatGPT-4o performance in individual spine conditions, this study compares ChatGPT-4o, a traditional large language model (LLM), against NotebookLM, a novel retrieval-augmented model (RAG-LLM) supplemented with North American Spine Society (NASS) guidelines, for concordance with all five published NASS guidelines for degenerative spinal diseases.</p><p><strong>Methods: </strong>A total of 118 questions from NASS guidelines regarding five degenerative spinal conditions were presented to ChatGPT-4o and NotebookLM. All responses were scored based on accuracy, evidence-based conclusions, supplementary and complete information.</p><p><strong>Results: </strong>Overall, NotebookLM provided significantly more accurate responses (98.3% vs. 40.7%, p < 0.05), more evidence-based conclusions (99.1% vs. 40.7%, p < 0.05), and more complete information (94.1% vs. 79.7%, p < 0.05), while ChatGPT-4o provided more supplementary information (98.3% vs. 67.8%, p < 0.05). These discrepancies became most prominent in nonsurgical and surgical interventions, wherein ChatGPT often produced recommendations with unsubstantiated certainty.</p><p><strong>Conclusion: </strong>While RAG-LLMs are a promising tool for clinical decision-making assistance and show significant improvement from prior models, physicians should remain cautious when integrating AI into patient care, especially in the context of nuanced medical scenarios.</p>","PeriodicalId":12323,"journal":{"name":"European Spine Journal","volume":" ","pages":""},"PeriodicalIF":2.6000,"publicationDate":"2025-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"European Spine Journal","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1007/s00586-025-08994-8","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"CLINICAL NEUROLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Purpose: Degenerative spinal diseases often require complex, patient-specific treatment, presenting a compelling challenge for artificial intelligence (AI) integration into clinical practice. While existing literature has focused on ChatGPT-4o performance in individual spine conditions, this study compares ChatGPT-4o, a traditional large language model (LLM), against NotebookLM, a novel retrieval-augmented model (RAG-LLM) supplemented with North American Spine Society (NASS) guidelines, for concordance with all five published NASS guidelines for degenerative spinal diseases.

Methods: A total of 118 questions from NASS guidelines regarding five degenerative spinal conditions were presented to ChatGPT-4o and NotebookLM. All responses were scored based on accuracy, evidence-based conclusions, supplementary and complete information.

Results: Overall, NotebookLM provided significantly more accurate responses (98.3% vs. 40.7%, p < 0.05), more evidence-based conclusions (99.1% vs. 40.7%, p < 0.05), and more complete information (94.1% vs. 79.7%, p < 0.05), while ChatGPT-4o provided more supplementary information (98.3% vs. 67.8%, p < 0.05). These discrepancies became most prominent in nonsurgical and surgical interventions, wherein ChatGPT often produced recommendations with unsubstantiated certainty.

Conclusion: While RAG-LLMs are a promising tool for clinical decision-making assistance and show significant improvement from prior models, physicians should remain cautious when integrating AI into patient care, especially in the context of nuanced medical scenarios.

评估检索增强生成和大语言模型在退行性脊柱疾病的临床指南。
目的:退行性脊柱疾病通常需要复杂的、针对患者的治疗,这对人工智能(AI)融入临床实践提出了一个引人注目的挑战。虽然现有文献主要关注chatgpt - 40在个体脊柱疾病中的表现,但本研究比较了chatgpt - 40(传统的大语言模型(LLM))与NotebookLM(一种新型检索增强模型(ragg -LLM),并补充了北美脊柱学会(NASS)指南,以与所有已发表的NASS退行性脊柱疾病指南的一致性。方法:在chatgpt - 40和NotebookLM中提交了来自NASS指南中关于五种退行性脊柱疾病的118个问题。所有回答都是基于准确性、基于证据的结论、补充和完整的信息进行评分的。结果:总体而言,NotebookLM提供了更准确的回答(98.3% vs. 40.7%)。结论:尽管ragr - llm是一种很有前途的临床决策辅助工具,并且与之前的模型相比有了显著的改进,但医生在将AI整合到患者护理中时应保持谨慎,特别是在细微差别的医疗场景中。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
European Spine Journal
European Spine Journal 医学-临床神经学
CiteScore
4.80
自引率
10.70%
发文量
373
审稿时长
2-4 weeks
期刊介绍: "European Spine Journal" is a publication founded in response to the increasing trend toward specialization in spinal surgery and spinal pathology in general. The Journal is devoted to all spine related disciplines, including functional and surgical anatomy of the spine, biomechanics and pathophysiology, diagnostic procedures, and neurology, surgery and outcomes. The aim of "European Spine Journal" is to support the further development of highly innovative spine treatments including but not restricted to surgery and to provide an integrated and balanced view of diagnostic, research and treatment procedures as well as outcomes that will enhance effective collaboration among specialists worldwide. The “European Spine Journal” also participates in education by means of videos, interactive meetings and the endorsement of educative efforts. Official publication of EUROSPINE, The Spine Society of Europe
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信