大语言模型在生成脊柱手术抗生素预防临床指南中的表现

IF 4.3 3区材料科学 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC

ACS Applied Electronic Materials Pub Date : 2024-03-01 DOI:10.14245/ns.2347310.655

Bashar Zaidat, Nancy Shrestha, Ashley M. Rosenberg, Wasil Ahmed, Rami Rajjoub, Timothy Hoang, Mateo Restrepo Mejia, A. Duey, Justin E. Tang, Jun S. Kim, Samuel K. Cho

{"title":"大语言模型在生成脊柱手术抗生素预防临床指南中的表现","authors":"Bashar Zaidat, Nancy Shrestha, Ashley M. Rosenberg, Wasil Ahmed, Rami Rajjoub, Timothy Hoang, Mateo Restrepo Mejia, A. Duey, Justin E. Tang, Jun S. Kim, Samuel K. Cho","doi":"10.14245/ns.2347310.655","DOIUrl":null,"url":null,"abstract":"Objective Large language models, such as chat generative pre-trained transformer (ChatGPT), have great potential for streamlining medical processes and assisting physicians in clinical decision-making. This study aimed to assess the potential of ChatGPT’s 2 models (GPT-3.5 and GPT-4.0) to support clinical decision-making by comparing its responses for antibiotic prophylaxis in spine surgery to accepted clinical guidelines. Methods ChatGPT models were prompted with questions from the North American Spine Society (NASS) Evidence-based Clinical Guidelines for Multidisciplinary Spine Care for Antibiotic Prophylaxis in Spine Surgery (2013). Its responses were then compared and assessed for accuracy. Results Of the 16 NASS guideline questions concerning antibiotic prophylaxis, 10 responses (62.5%) were accurate in ChatGPT’s GPT-3.5 model and 13 (81%) were accurate in GPT-4.0. Twenty-five percent of GPT-3.5 answers were deemed as overly confident while 62.5% of GPT-4.0 answers directly used the NASS guideline as evidence for its response. Conclusion ChatGPT demonstrated an impressive ability to accurately answer clinical questions. GPT-3.5 model’s performance was limited by its tendency to give overly confident responses and its inability to identify the most significant elements in its responses. GPT-4.0 model’s responses had higher accuracy and cited the NASS guideline as direct evidence many times. While GPT-4.0 is still far from perfect, it has shown an exceptional ability to extract the most relevant research available compared to GPT-3.5. Thus, while ChatGPT has shown far-reaching potential, scrutiny should still be exercised regarding its clinical use at this time.","PeriodicalId":3,"journal":{"name":"ACS Applied Electronic Materials","volume":"26 12","pages":"128 - 146"},"PeriodicalIF":4.3000,"publicationDate":"2024-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Performance of a Large Language Model in the Generation of Clinical Guidelines for Antibiotic Prophylaxis in Spine Surgery\",\"authors\":\"Bashar Zaidat, Nancy Shrestha, Ashley M. Rosenberg, Wasil Ahmed, Rami Rajjoub, Timothy Hoang, Mateo Restrepo Mejia, A. Duey, Justin E. Tang, Jun S. Kim, Samuel K. Cho\",\"doi\":\"10.14245/ns.2347310.655\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Objective Large language models, such as chat generative pre-trained transformer (ChatGPT), have great potential for streamlining medical processes and assisting physicians in clinical decision-making. This study aimed to assess the potential of ChatGPT’s 2 models (GPT-3.5 and GPT-4.0) to support clinical decision-making by comparing its responses for antibiotic prophylaxis in spine surgery to accepted clinical guidelines. Methods ChatGPT models were prompted with questions from the North American Spine Society (NASS) Evidence-based Clinical Guidelines for Multidisciplinary Spine Care for Antibiotic Prophylaxis in Spine Surgery (2013). Its responses were then compared and assessed for accuracy. Results Of the 16 NASS guideline questions concerning antibiotic prophylaxis, 10 responses (62.5%) were accurate in ChatGPT’s GPT-3.5 model and 13 (81%) were accurate in GPT-4.0. Twenty-five percent of GPT-3.5 answers were deemed as overly confident while 62.5% of GPT-4.0 answers directly used the NASS guideline as evidence for its response. Conclusion ChatGPT demonstrated an impressive ability to accurately answer clinical questions. GPT-3.5 model’s performance was limited by its tendency to give overly confident responses and its inability to identify the most significant elements in its responses. GPT-4.0 model’s responses had higher accuracy and cited the NASS guideline as direct evidence many times. While GPT-4.0 is still far from perfect, it has shown an exceptional ability to extract the most relevant research available compared to GPT-3.5. Thus, while ChatGPT has shown far-reaching potential, scrutiny should still be exercised regarding its clinical use at this time.\",\"PeriodicalId\":3,\"journal\":{\"name\":\"ACS Applied Electronic Materials\",\"volume\":\"26 12\",\"pages\":\"128 - 146\"},\"PeriodicalIF\":4.3000,\"publicationDate\":\"2024-03-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACS Applied Electronic Materials\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.14245/ns.2347310.655\",\"RegionNum\":3,\"RegionCategory\":\"材料科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACS Applied Electronic Materials","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.14245/ns.2347310.655","RegionNum":3,"RegionCategory":"材料科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 1

摘要

目的大型语言模型，如聊天生成预训练转换器（ChatGPT），在简化医疗流程和辅助医生临床决策方面具有巨大潜力。本研究旨在通过比较 ChatGPT 对脊柱手术中抗生素预防的反应与公认的临床指南，评估 ChatGPT 的两个模型（GPT-3.5 和 GPT-4.0）在支持临床决策方面的潜力。方法根据北美脊柱学会（NASS）《脊柱手术抗生素预防的多学科脊柱护理循证临床指南》（2013 年）中的问题对 ChatGPT 模型进行提示。然后对其回答进行比较并评估其准确性。结果在 16 个有关抗生素预防的 NASS 指南问题中，ChatGPT 的 GPT-3.5 模型中有 10 个回答（62.5%）是准确的，GPT-4.0 模型中有 13 个回答（81%）是准确的。25% 的 GPT-3.5 答案被认为过于自信，而 62.5% 的 GPT-4.0 答案直接使用 NASS 指南作为其回答的证据。结论 ChatGPT 在准确回答临床问题方面表现出了令人印象深刻的能力。GPT-3.5 模型的性能受到了限制，因为它倾向于给出过于自信的回答，而且无法识别其回答中最重要的内容。GPT-4.0 模型的回答具有更高的准确性，并多次引用 NASS 指南作为直接证据。虽然 GPT-4.0 还远不够完美，但与 GPT-3.5 相比，它在提取最相关的研究成果方面表现出了非凡的能力。因此，虽然 ChatGPT 已显示出深远的潜力，但目前仍应对其临床使用进行严格审查。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Performance of a Large Language Model in the Generation of Clinical Guidelines for Antibiotic Prophylaxis in Spine Surgery

Objective Large language models, such as chat generative pre-trained transformer (ChatGPT), have great potential for streamlining medical processes and assisting physicians in clinical decision-making. This study aimed to assess the potential of ChatGPT’s 2 models (GPT-3.5 and GPT-4.0) to support clinical decision-making by comparing its responses for antibiotic prophylaxis in spine surgery to accepted clinical guidelines. Methods ChatGPT models were prompted with questions from the North American Spine Society (NASS) Evidence-based Clinical Guidelines for Multidisciplinary Spine Care for Antibiotic Prophylaxis in Spine Surgery (2013). Its responses were then compared and assessed for accuracy. Results Of the 16 NASS guideline questions concerning antibiotic prophylaxis, 10 responses (62.5%) were accurate in ChatGPT’s GPT-3.5 model and 13 (81%) were accurate in GPT-4.0. Twenty-five percent of GPT-3.5 answers were deemed as overly confident while 62.5% of GPT-4.0 answers directly used the NASS guideline as evidence for its response. Conclusion ChatGPT demonstrated an impressive ability to accurately answer clinical questions. GPT-3.5 model’s performance was limited by its tendency to give overly confident responses and its inability to identify the most significant elements in its responses. GPT-4.0 model’s responses had higher accuracy and cited the NASS guideline as direct evidence many times. While GPT-4.0 is still far from perfect, it has shown an exceptional ability to extract the most relevant research available compared to GPT-3.5. Thus, while ChatGPT has shown far-reaching potential, scrutiny should still be exercised regarding its clinical use at this time.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

ACS Applied Electronic Materials Multiple-

CiteScore

7.20

自引率

4.30%

发文量

567

期刊介绍： ACS Applied Electronic Materials is an interdisciplinary journal publishing original research covering all aspects of electronic materials. The journal is devoted to reports of new and original experimental and theoretical research of an applied nature that integrate knowledge in the areas of materials science, engineering, optics, physics, and chemistry into important applications of electronic materials. Sample research topics that span the journal's scope are inorganic, organic, ionic and polymeric materials with properties that include conducting, semiconducting, superconducting, insulating, dielectric, magnetic, optoelectronic, piezoelectric, ferroelectric and thermoelectric. Indexed/Abstracted： Web of Science SCIE Scopus CAS INSPEC Portico