ChatGPT Predicts In-Hospital All-Cause Mortality for Sepsis: In-Context Learning with the Korean Sepsis Alliance Database.

IF 2.3 Q3 MEDICAL INFORMATICS

Healthcare Informatics Research Pub Date : 2024-07-01 Epub Date: 2024-07-31 DOI:10.4258/hir.2024.30.3.266

Namkee Oh, Won Chul Cha, Jun Hyuk Seo, Seong-Gyu Choi, Jong Man Kim, Chi Ryang Chung, Gee Young Suh, Su Yeon Lee, Dong Kyu Oh, Mi Hyeon Park, Chae-Man Lim, Ryoung-Eun Ko

{"title":"ChatGPT Predicts In-Hospital All-Cause Mortality for Sepsis: In-Context Learning with the Korean Sepsis Alliance Database.","authors":"Namkee Oh, Won Chul Cha, Jun Hyuk Seo, Seong-Gyu Choi, Jong Man Kim, Chi Ryang Chung, Gee Young Suh, Su Yeon Lee, Dong Kyu Oh, Mi Hyeon Park, Chae-Man Lim, Ryoung-Eun Ko","doi":"10.4258/hir.2024.30.3.266","DOIUrl":null,"url":null,"abstract":"Objectives: Sepsis is a leading global cause of mortality, and predicting its outcomes is vital for improving patient care. This study explored the capabilities of ChatGPT, a state-of-the-art natural language processing model, in predicting in-hospital mortality for sepsis patients.Methods: This study utilized data from the Korean Sepsis Alliance (KSA) database, collected between 2019 and 2021, focusing on adult intensive care unit (ICU) patients and aiming to determine whether ChatGPT could predict all-cause mortality after ICU admission at 7 and 30 days. Structured prompts enabled ChatGPT to engage in in-context learning, with the number of patient examples varying from zero to six. The predictive capabilities of ChatGPT-3.5-turbo and ChatGPT-4 were then compared against a gradient boosting model (GBM) using various performance metrics.Results: From the KSA database, 4,786 patients formed the 7-day mortality prediction dataset, of whom 718 died, and 4,025 patients formed the 30-day dataset, with 1,368 deaths. Age and clinical markers (e.g., Sequential Organ Failure Assessment score and lactic acid levels) showed significant differences between survivors and non-survivors in both datasets. For 7-day mortality predictions, the area under the receiver operating characteristic curve (AUROC) was 0.70-0.83 for GPT-4, 0.51-0.70 for GPT-3.5, and 0.79 for GBM. The AUROC for 30-day mortality was 0.51-0.59 for GPT-4, 0.47-0.57 for GPT-3.5, and 0.76 for GBM. Zero-shot predictions using GPT-4 for mortality from ICU admission to day 30 showed AUROCs from the mid-0.60s to 0.75 for GPT-4 and mainly from 0.47 to 0.63 for GPT-3.5.Conclusions: GPT-4 demonstrated potential in predicting short-term in-hospital mortality, although its performance varied across different evaluation metrics.","PeriodicalId":12947,"journal":{"name":"Healthcare Informatics Research","volume":"30 3","pages":"266-276"},"PeriodicalIF":2.3000,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11333818/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Healthcare Informatics Research","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4258/hir.2024.30.3.266","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/7/31 0:00:00","PubModel":"Epub","JCR":"Q3","JCRName":"MEDICAL INFORMATICS","Score":null,"Total":0}

引用次数: 0

Abstract

Objectives: Sepsis is a leading global cause of mortality, and predicting its outcomes is vital for improving patient care. This study explored the capabilities of ChatGPT, a state-of-the-art natural language processing model, in predicting in-hospital mortality for sepsis patients.

Methods: This study utilized data from the Korean Sepsis Alliance (KSA) database, collected between 2019 and 2021, focusing on adult intensive care unit (ICU) patients and aiming to determine whether ChatGPT could predict all-cause mortality after ICU admission at 7 and 30 days. Structured prompts enabled ChatGPT to engage in in-context learning, with the number of patient examples varying from zero to six. The predictive capabilities of ChatGPT-3.5-turbo and ChatGPT-4 were then compared against a gradient boosting model (GBM) using various performance metrics.

Results: From the KSA database, 4,786 patients formed the 7-day mortality prediction dataset, of whom 718 died, and 4,025 patients formed the 30-day dataset, with 1,368 deaths. Age and clinical markers (e.g., Sequential Organ Failure Assessment score and lactic acid levels) showed significant differences between survivors and non-survivors in both datasets. For 7-day mortality predictions, the area under the receiver operating characteristic curve (AUROC) was 0.70-0.83 for GPT-4, 0.51-0.70 for GPT-3.5, and 0.79 for GBM. The AUROC for 30-day mortality was 0.51-0.59 for GPT-4, 0.47-0.57 for GPT-3.5, and 0.76 for GBM. Zero-shot predictions using GPT-4 for mortality from ICU admission to day 30 showed AUROCs from the mid-0.60s to 0.75 for GPT-4 and mainly from 0.47 to 0.63 for GPT-3.5.

Conclusions: GPT-4 demonstrated potential in predicting short-term in-hospital mortality, although its performance varied across different evaluation metrics.

查看原文本刊更多论文

ChatGPT 预测脓毒症院内全因死亡率：利用韩国脓毒症联盟数据库进行情景学习。

目的：败血症是导致全球死亡的主要原因，预测其结果对于改善患者护理至关重要。本研究探讨了最先进的自然语言处理模型 ChatGPT 预测败血症患者院内死亡率的能力：本研究利用了韩国脓毒症联盟（KSA）数据库在 2019 年至 2021 年间收集的数据，重点关注成人重症监护病房（ICU）患者，旨在确定 ChatGPT 能否预测 ICU 入院后 7 天和 30 天的全因死亡率。结构化提示使 ChatGPT 能够进行情境学习，患者实例的数量从 0 到 6 不等。然后使用各种性能指标将 ChatGPT-3.5-turbo 和 ChatGPT-4 的预测能力与梯度提升模型（GBM）进行了比较：在 KSA 数据库中，4786 名患者组成了 7 天死亡率预测数据集，其中 718 人死亡；4025 名患者组成了 30 天死亡率预测数据集，其中 1368 人死亡。年龄和临床指标（如序贯器官衰竭评估评分和乳酸水平）在两个数据集中显示出幸存者和非幸存者之间的显著差异。在预测 7 天死亡率方面，GPT-4 的接收者操作特征曲线下面积（AUROC）为 0.70-0.83，GPT-3.5 为 0.51-0.70，GBM 为 0.79。GPT-4 的 30 天死亡率接受者操作特征曲线为 0.51-0.59，GPT-3.5 为 0.47-0.57，GBM 为 0.76。使用 GPT-4 对 ICU 入院至第 30 天的死亡率进行零点预测，GPT-4 的 AUROC 在 0.60s 到 0.75 之间，GPT-3.5 的 AUROC 主要在 0.47 到 0.63 之间：GPT-4在预测短期院内死亡率方面表现出了潜力，但在不同的评价指标上表现各异。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Healthcare Informatics Research MEDICAL INFORMATICS-

CiteScore

4.90

自引率

6.90%

发文量