The applications of ChatGPT and other large language models in anesthesiology and critical care: a systematic review.

IF 3.3 3区医学 Q1 ANESTHESIOLOGY

Canadian Journal of Anesthesia-Journal Canadien D Anesthesie Pub Date : 2025-06-01 Epub Date: 2025-06-16 DOI:10.1007/s12630-025-02973-9

Nicolas Daccache, Joe Zako, Louis Morisson, Pascal Laferrière-Langlois

{"title":"The applications of ChatGPT and other large language models in anesthesiology and critical care: a systematic review.","authors":"Nicolas Daccache, Joe Zako, Louis Morisson, Pascal Laferrière-Langlois","doi":"10.1007/s12630-025-02973-9","DOIUrl":null,"url":null,"abstract":"Purpose: ChatGPT and other large language models (LLMs) have gained immense popularity since their commercial release in 2022, with applications in various sectors including health care. We sought to evaluate their deployment in anesthesiology and critical care in a systematic review. Our aim was to describe the integration of LLMs in the field by showcasing and categorizing their current applications, assessing their performance in patient care, and reviewing application-specific ethical and practical challenges in deployment.Methods: Respecting Preferred Reporting Items for Systematic Reviews and Meta-analyses (PRISMA) guidelines, we systematically searched through PubMed®, Embase, the Cochrane Central Register of Controlled Trials, and Web of Science®, from inception until 1 August 2024. We extracted all papers investigating LLMs in anesthesiology or critical care and reporting results. We segmented the literature into major themes and highlighted key findings and limitations.Results: From 480 retrieved articles, we included 45 papers. The evaluated models (GPT-4, GPT-3.5, Google Bard [now Gemini], LLaMA, and others) showed diverse applications in four segments: intensive care unit, patient education, medical education, and perioperative care. Large language models, especially newer models, are promising in predicting clinical scores, navigating simple clinical scenarios, and managing preoperative anxiety. Their performance remains below the clinician level in predicting outcomes, solving complex clinical scenarios (i.e., airway management), board examinations, and generating patient-directed documents, although newer models performed better than older ones.Conclusion: While LLMs are not yet equipped to fully assist physicians in anesthesiology and critical care, they have significant potential, and their capabilities are rapidly improving. Supervised use for select tasks can streamline patient care. Further trials are warranted as new versions of models become available.Study registration: PROSPERO ( CRD42024567380 ); first submitted 22 July 2024.","PeriodicalId":56145,"journal":{"name":"Canadian Journal of Anesthesia-Journal Canadien D Anesthesie","volume":" ","pages":"904-922"},"PeriodicalIF":3.3000,"publicationDate":"2025-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Canadian Journal of Anesthesia-Journal Canadien D Anesthesie","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1007/s12630-025-02973-9","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/6/16 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"ANESTHESIOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Purpose: ChatGPT and other large language models (LLMs) have gained immense popularity since their commercial release in 2022, with applications in various sectors including health care. We sought to evaluate their deployment in anesthesiology and critical care in a systematic review. Our aim was to describe the integration of LLMs in the field by showcasing and categorizing their current applications, assessing their performance in patient care, and reviewing application-specific ethical and practical challenges in deployment.

Methods: Respecting Preferred Reporting Items for Systematic Reviews and Meta-analyses (PRISMA) guidelines, we systematically searched through PubMed®, Embase, the Cochrane Central Register of Controlled Trials, and Web of Science®, from inception until 1 August 2024. We extracted all papers investigating LLMs in anesthesiology or critical care and reporting results. We segmented the literature into major themes and highlighted key findings and limitations.

Results: From 480 retrieved articles, we included 45 papers. The evaluated models (GPT-4, GPT-3.5, Google Bard [now Gemini], LLaMA, and others) showed diverse applications in four segments: intensive care unit, patient education, medical education, and perioperative care. Large language models, especially newer models, are promising in predicting clinical scores, navigating simple clinical scenarios, and managing preoperative anxiety. Their performance remains below the clinician level in predicting outcomes, solving complex clinical scenarios (i.e., airway management), board examinations, and generating patient-directed documents, although newer models performed better than older ones.

Conclusion: While LLMs are not yet equipped to fully assist physicians in anesthesiology and critical care, they have significant potential, and their capabilities are rapidly improving. Supervised use for select tasks can streamline patient care. Further trials are warranted as new versions of models become available.

Study registration: PROSPERO ( CRD42024567380 ); first submitted 22 July 2024.

查看原文本刊更多论文

ChatGPT和其他大型语言模型在麻醉学和重症监护中的应用：系统综述。

目的：ChatGPT和其他大型语言模型（llm）自2022年商业发布以来获得了极大的普及，应用于包括医疗保健在内的各个领域。我们试图在系统回顾中评估它们在麻醉学和重症监护中的应用。我们的目的是通过展示和分类llm当前的应用，评估其在患者护理中的表现，以及审查部署中特定应用的道德和实践挑战，来描述llm在该领域的整合。方法：根据系统评价和元分析（PRISMA）指南的首选报告项目，我们系统地检索了PubMed®，Embase， Cochrane中央对照试验注册库和Web of Science®，从成立到2024年8月1日。我们提取了所有调查麻醉学或重症监护法学硕士并报告结果的论文。我们将文献划分为主要主题，并突出了主要发现和局限性。结果：从480篇检索文章中，我们纳入了45篇。所评估的模型（GPT-4、GPT-3.5、b谷歌Bard[现为Gemini]、LLaMA等）在重症监护病房、患者教育、医学教育和围手术期护理四个领域显示出不同的应用。大型语言模型，特别是较新的模型，在预测临床评分、导航简单的临床场景和管理术前焦虑方面很有希望。在预测结果、解决复杂的临床场景（即气道管理）、委员会检查和生成患者导向文件方面，它们的表现仍低于临床医生水平，尽管新模型的表现优于旧模型。结论：虽然法学硕士还不能完全协助医生进行麻醉和危重病护理，但他们有很大的潜力，而且他们的能力正在迅速提高。有监督地使用特定的任务可以简化病人的护理。当新版本的模型可用时，需要进行进一步的试验。研究注册：PROSPERO (CRD42024567380)；首次提交于2024年7月22日。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Canadian Journal of Anesthesia-Journal Canadien D Anesthesie 医学-麻醉学

CiteScore

8.50

自引率

7.10%

发文量

161

审稿时长

6-12 weeks

期刊介绍： The Canadian Journal of Anesthesia (the Journal) is owned by the Canadian Anesthesiologists’ Society and is published by Springer Science + Business Media, LLM (New York). From the first year of publication in 1954, the international exposure of the Journal has broadened considerably, with articles now received from over 50 countries. The Journal is published monthly, and has an impact Factor (mean journal citation frequency) of 2.127 (in 2012). Article types consist of invited editorials, reports of original investigations (clinical and basic sciences articles), case reports/case series, review articles, systematic reviews, accredited continuing professional development (CPD) modules, and Letters to the Editor. The editorial content, according to the mission statement, spans the fields of anesthesia, acute and chronic pain, perioperative medicine and critical care. In addition, the Journal publishes practice guidelines and standards articles relevant to clinicians. Articles are published either in English or in French, according to the language of submission.