Performance of a Large Language Model in the Generation of Clinical Guidelines for Antibiotic Prophylaxis in Spine Surgery

IF 3.8 2区 医学 Q1 CLINICAL NEUROLOGY
Neurospine Pub Date : 2024-03-01 DOI:10.14245/ns.2347310.655
Bashar Zaidat, Nancy Shrestha, Ashley M. Rosenberg, Wasil Ahmed, Rami Rajjoub, Timothy Hoang, Mateo Restrepo Mejia, A. Duey, Justin E. Tang, Jun S. Kim, Samuel K. Cho
{"title":"Performance of a Large Language Model in the Generation of Clinical Guidelines for Antibiotic Prophylaxis in Spine Surgery","authors":"Bashar Zaidat, Nancy Shrestha, Ashley M. Rosenberg, Wasil Ahmed, Rami Rajjoub, Timothy Hoang, Mateo Restrepo Mejia, A. Duey, Justin E. Tang, Jun S. Kim, Samuel K. Cho","doi":"10.14245/ns.2347310.655","DOIUrl":null,"url":null,"abstract":"Objective Large language models, such as chat generative pre-trained transformer (ChatGPT), have great potential for streamlining medical processes and assisting physicians in clinical decision-making. This study aimed to assess the potential of ChatGPT’s 2 models (GPT-3.5 and GPT-4.0) to support clinical decision-making by comparing its responses for antibiotic prophylaxis in spine surgery to accepted clinical guidelines. Methods ChatGPT models were prompted with questions from the North American Spine Society (NASS) Evidence-based Clinical Guidelines for Multidisciplinary Spine Care for Antibiotic Prophylaxis in Spine Surgery (2013). Its responses were then compared and assessed for accuracy. Results Of the 16 NASS guideline questions concerning antibiotic prophylaxis, 10 responses (62.5%) were accurate in ChatGPT’s GPT-3.5 model and 13 (81%) were accurate in GPT-4.0. Twenty-five percent of GPT-3.5 answers were deemed as overly confident while 62.5% of GPT-4.0 answers directly used the NASS guideline as evidence for its response. Conclusion ChatGPT demonstrated an impressive ability to accurately answer clinical questions. GPT-3.5 model’s performance was limited by its tendency to give overly confident responses and its inability to identify the most significant elements in its responses. GPT-4.0 model’s responses had higher accuracy and cited the NASS guideline as direct evidence many times. While GPT-4.0 is still far from perfect, it has shown an exceptional ability to extract the most relevant research available compared to GPT-3.5. Thus, while ChatGPT has shown far-reaching potential, scrutiny should still be exercised regarding its clinical use at this time.","PeriodicalId":19269,"journal":{"name":"Neurospine","volume":null,"pages":null},"PeriodicalIF":3.8000,"publicationDate":"2024-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neurospine","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.14245/ns.2347310.655","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CLINICAL NEUROLOGY","Score":null,"Total":0}
引用次数: 1

Abstract

Objective Large language models, such as chat generative pre-trained transformer (ChatGPT), have great potential for streamlining medical processes and assisting physicians in clinical decision-making. This study aimed to assess the potential of ChatGPT’s 2 models (GPT-3.5 and GPT-4.0) to support clinical decision-making by comparing its responses for antibiotic prophylaxis in spine surgery to accepted clinical guidelines. Methods ChatGPT models were prompted with questions from the North American Spine Society (NASS) Evidence-based Clinical Guidelines for Multidisciplinary Spine Care for Antibiotic Prophylaxis in Spine Surgery (2013). Its responses were then compared and assessed for accuracy. Results Of the 16 NASS guideline questions concerning antibiotic prophylaxis, 10 responses (62.5%) were accurate in ChatGPT’s GPT-3.5 model and 13 (81%) were accurate in GPT-4.0. Twenty-five percent of GPT-3.5 answers were deemed as overly confident while 62.5% of GPT-4.0 answers directly used the NASS guideline as evidence for its response. Conclusion ChatGPT demonstrated an impressive ability to accurately answer clinical questions. GPT-3.5 model’s performance was limited by its tendency to give overly confident responses and its inability to identify the most significant elements in its responses. GPT-4.0 model’s responses had higher accuracy and cited the NASS guideline as direct evidence many times. While GPT-4.0 is still far from perfect, it has shown an exceptional ability to extract the most relevant research available compared to GPT-3.5. Thus, while ChatGPT has shown far-reaching potential, scrutiny should still be exercised regarding its clinical use at this time.
大语言模型在生成脊柱手术抗生素预防临床指南中的表现
目的 大型语言模型,如聊天生成预训练转换器(ChatGPT),在简化医疗流程和辅助医生临床决策方面具有巨大潜力。本研究旨在通过比较 ChatGPT 对脊柱手术中抗生素预防的反应与公认的临床指南,评估 ChatGPT 的两个模型(GPT-3.5 和 GPT-4.0)在支持临床决策方面的潜力。方法 根据北美脊柱学会(NASS)《脊柱手术抗生素预防的多学科脊柱护理循证临床指南》(2013 年)中的问题对 ChatGPT 模型进行提示。然后对其回答进行比较并评估其准确性。结果 在 16 个有关抗生素预防的 NASS 指南问题中,ChatGPT 的 GPT-3.5 模型中有 10 个回答(62.5%)是准确的,GPT-4.0 模型中有 13 个回答(81%)是准确的。25% 的 GPT-3.5 答案被认为过于自信,而 62.5% 的 GPT-4.0 答案直接使用 NASS 指南作为其回答的证据。结论 ChatGPT 在准确回答临床问题方面表现出了令人印象深刻的能力。GPT-3.5 模型的性能受到了限制,因为它倾向于给出过于自信的回答,而且无法识别其回答中最重要的内容。GPT-4.0 模型的回答具有更高的准确性,并多次引用 NASS 指南作为直接证据。虽然 GPT-4.0 还远不够完美,但与 GPT-3.5 相比,它在提取最相关的研究成果方面表现出了非凡的能力。因此,虽然 ChatGPT 已显示出深远的潜力,但目前仍应对其临床使用进行严格审查。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Neurospine
Neurospine Multiple-
CiteScore
5.80
自引率
18.80%
发文量
93
审稿时长
10 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信