Large language model-generated clinical practice guideline for appendicitis.

IF 2.4 2区 医学 Q2 SURGERY
Amy Boyle, Bright Huo, Patricia Sylla, Elisa Calabrese, Sunjay Kumar, Bethany J Slater, Danielle S Walsh, R Wesley Vosburg
{"title":"Large language model-generated clinical practice guideline for appendicitis.","authors":"Amy Boyle, Bright Huo, Patricia Sylla, Elisa Calabrese, Sunjay Kumar, Bethany J Slater, Danielle S Walsh, R Wesley Vosburg","doi":"10.1007/s00464-025-11723-3","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Clinical practice guidelines provide important evidence-based recommendations to optimize patient care, but their development is labor-intensive and time-consuming. Large language models have shown promise in supporting academic writing and the development of systematic reviews, but their ability to assist with guideline development has not been explored. In this study, we tested the capacity of LLMs to support each stage of guideline development, using the latest SAGES guideline on the surgical management of appendicitis as a comparison.</p><p><strong>Methods: </strong>Prompts were engineered to trigger LLMs to perform each task of guideline development, using key questions and PICOs derived from the SAGES guideline. ChatGPT-4, Google Gemini, Consensus, and Perplexity were queried on February 21, 2024. LLM performance was evaluated qualitatively, with narrative descriptions of each task's output. The Appraisal of Guidelines for Research and Evaluation in Surgery (AGREE-S) instrument was used to quantitatively assess the quality of the LLM-derived guideline compared to the existing SAGES guideline.</p><p><strong>Results: </strong>Popular LLMs were able to generate a search syntax, perform data analysis, and follow the GRADE approach and Evidence-to-Decision framework to produce guideline recommendations. These LLMs were unable to independently perform a systematic literature search or reliably perform screening, data extraction, or risk of bias assessment at the time of testing. AGREE-S appraisal produced a total score of 119 for the LLM-derived guideline and 156 for the SAGES guideline. In 19 of the 24 domains, the two guidelines scored within two points of each other.</p><p><strong>Conclusions: </strong>LLMs demonstrate potential to assist with certain steps of guideline development, which may reduce time and resource burden associated with these tasks. As new models are developed, the role for LLMs in guideline development will continue to evolve. Ongoing research and multidisciplinary collaboration are needed to support the safe and effective integration of LLMs in each step of guideline development.</p>","PeriodicalId":22174,"journal":{"name":"Surgical Endoscopy And Other Interventional Techniques","volume":" ","pages":"3539-3551"},"PeriodicalIF":2.4000,"publicationDate":"2025-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Surgical Endoscopy And Other Interventional Techniques","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1007/s00464-025-11723-3","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/4/18 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"SURGERY","Score":null,"Total":0}
引用次数: 0

Abstract

Background: Clinical practice guidelines provide important evidence-based recommendations to optimize patient care, but their development is labor-intensive and time-consuming. Large language models have shown promise in supporting academic writing and the development of systematic reviews, but their ability to assist with guideline development has not been explored. In this study, we tested the capacity of LLMs to support each stage of guideline development, using the latest SAGES guideline on the surgical management of appendicitis as a comparison.

Methods: Prompts were engineered to trigger LLMs to perform each task of guideline development, using key questions and PICOs derived from the SAGES guideline. ChatGPT-4, Google Gemini, Consensus, and Perplexity were queried on February 21, 2024. LLM performance was evaluated qualitatively, with narrative descriptions of each task's output. The Appraisal of Guidelines for Research and Evaluation in Surgery (AGREE-S) instrument was used to quantitatively assess the quality of the LLM-derived guideline compared to the existing SAGES guideline.

Results: Popular LLMs were able to generate a search syntax, perform data analysis, and follow the GRADE approach and Evidence-to-Decision framework to produce guideline recommendations. These LLMs were unable to independently perform a systematic literature search or reliably perform screening, data extraction, or risk of bias assessment at the time of testing. AGREE-S appraisal produced a total score of 119 for the LLM-derived guideline and 156 for the SAGES guideline. In 19 of the 24 domains, the two guidelines scored within two points of each other.

Conclusions: LLMs demonstrate potential to assist with certain steps of guideline development, which may reduce time and resource burden associated with these tasks. As new models are developed, the role for LLMs in guideline development will continue to evolve. Ongoing research and multidisciplinary collaboration are needed to support the safe and effective integration of LLMs in each step of guideline development.

大语言模型生成的阑尾炎临床实践指南。
背景:临床实践指南为优化患者护理提供了重要的循证建议,但其制定是劳动密集型和耗时的。大型语言模型在支持学术写作和系统综述的发展方面显示出了希望,但它们协助指南发展的能力尚未得到探索。在这项研究中,我们测试了llm支持指南制定各个阶段的能力,使用最新的SAGES阑尾炎手术治疗指南作为比较。方法:使用SAGES指南衍生的关键问题和pico,设计提示以触发llm执行指南制定的每个任务。ChatGPT-4,谷歌Gemini, Consensus和Perplexity于2024年2月21日进行了查询。对LLM的表现进行定性评估,对每个任务的输出进行叙述性描述。使用外科研究和评估指南评估(AGREE-S)工具定量评估llm衍生指南与现有SAGES指南的质量。结果:受欢迎的法学硕士能够生成搜索语法,执行数据分析,并遵循GRADE方法和证据到决策框架来产生指导建议。这些法学硕士无法独立进行系统的文献检索或可靠地进行筛选、数据提取或在测试时进行偏倚风险评估。AGREE-S评估得出llm衍生指南的总分为119分,SAGES指南的总分为156分。在24个领域中,有19个领域的得分相差不超过2分。结论:法学硕士展示了在指南制定的某些步骤中提供帮助的潜力,这可能会减少与这些任务相关的时间和资源负担。随着新模型的发展,法学硕士在指南制定中的作用将继续发展。需要持续的研究和多学科合作来支持在指南制定的每个步骤中安全有效地整合法学硕士。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
6.10
自引率
12.90%
发文量
890
审稿时长
6 months
期刊介绍: Uniquely positioned at the interface between various medical and surgical disciplines, Surgical Endoscopy serves as a focal point for the international surgical community to exchange information on practice, theory, and research. Topics covered in the journal include: -Surgical aspects of: Interventional endoscopy, Ultrasound, Other techniques in the fields of gastroenterology, obstetrics, gynecology, and urology, -Gastroenterologic surgery -Thoracic surgery -Traumatic surgery -Orthopedic surgery -Pediatric surgery
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信