评估gpt - 40对复杂呼吸道病例的紧急处置与肺科会诊:诊断准确性研究。

IF 3.1 2区 医学 Q1 EMERGENCY MEDICINE
Cem Yıldırım, Ahmet Aykut, Ertuğ Günsoy, Mehmet Veysel Öncül
{"title":"评估gpt - 40对复杂呼吸道病例的紧急处置与肺科会诊:诊断准确性研究。","authors":"Cem Yıldırım, Ahmet Aykut, Ertuğ Günsoy, Mehmet Veysel Öncül","doi":"10.1186/s13049-025-01475-3","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Large Language Models (LLMs), such as GPT-4o, are increasingly investigated for clinical decision support in emergency medicine. However, their real-world performance in disposition prediction remains insufficiently studied. This study evaluated the diagnostic accuracy of GPT-4o in predicting ED disposition-discharge, ward admission, or ICU admission-in complex emergency respiratory cases requiring pulmonology consultation and chest CT, representing a selective high-acuity subgroup of ED patients.</p><p><strong>Methods: </strong>We conducted a retrospective observational study in a tertiary ED between November 2024 and February 2025. We retrospectively included ED patients with complex respiratory presentations who underwent pulmonology consultation and chest CT, representing a selective high-acuity subgroup rather than the general ED respiratory population. GPT-4o was prompted to predict the most appropriate ED disposition using three progressively enriched input models: Model 1 (age, sex, oxygen saturation, home oxygen therapy, and venous blood gas parameters); Model 2 (Model 1 plus laboratory data); and Model 3 (Model 2 plus chest CT findings). Model performance was assessed using accuracy, sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and F1 score.</p><p><strong>Results: </strong>Among the 221 patients included, 69.2% were admitted to the ward, 9.0% to the intensive care unit (ICU), and 21.7% were discharged. For hospital admission prediction, Model 3 demonstrated the highest sensitivity (91.9%) and overall accuracy (76.5%), but the lowest specificity (20.8%). In contrast, for discharge prediction, Model 3 achieved the highest specificity (91.9%) but the lowest sensitivity (20.8%). Numerical improvements were observed across models, but none reached statistical significance (all p > 0.22). Model 1 therefore performed comparably to Models 2-3 while being less complex. Among patients who were discharged despite GPT-4o predicting admission, the 14-day ED re-presentation rates were 23.8% (5/21) for Model 1, 30.0% (9/30) for Model 2, and 28.9% (11/38) for Model 3.</p><p><strong>Conclusion: </strong>GPT-4o demonstrated high sensitivity in identifying ED patients requiring hospital admission, particularly those needing intensive care, when provided with progressively enriched clinical input. However, its low sensitivity for discharge prediction resulted in frequent overtriage, limiting its utility for autonomous decision-making. This proof-of-concept study demonstrates GPT-4o's capacity to stratify disposition decisions in complex respiratory cases under varying levels of limited input data. However, these findings should be interpreted in light of key limitations, including the selective high-acuity cohort and the absence of vital signs, and require prospective validation before clinical implementation.</p>","PeriodicalId":49292,"journal":{"name":"Scandinavian Journal of Trauma Resuscitation & Emergency Medicine","volume":"33 1","pages":"159"},"PeriodicalIF":3.1000,"publicationDate":"2025-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12492850/pdf/","citationCount":"0","resultStr":"{\"title\":\"Evaluating GPT-4o for emergency disposition of complex respiratory cases with pulmonology consultation: a diagnostic accuracy study.\",\"authors\":\"Cem Yıldırım, Ahmet Aykut, Ertuğ Günsoy, Mehmet Veysel Öncül\",\"doi\":\"10.1186/s13049-025-01475-3\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>Large Language Models (LLMs), such as GPT-4o, are increasingly investigated for clinical decision support in emergency medicine. However, their real-world performance in disposition prediction remains insufficiently studied. This study evaluated the diagnostic accuracy of GPT-4o in predicting ED disposition-discharge, ward admission, or ICU admission-in complex emergency respiratory cases requiring pulmonology consultation and chest CT, representing a selective high-acuity subgroup of ED patients.</p><p><strong>Methods: </strong>We conducted a retrospective observational study in a tertiary ED between November 2024 and February 2025. We retrospectively included ED patients with complex respiratory presentations who underwent pulmonology consultation and chest CT, representing a selective high-acuity subgroup rather than the general ED respiratory population. GPT-4o was prompted to predict the most appropriate ED disposition using three progressively enriched input models: Model 1 (age, sex, oxygen saturation, home oxygen therapy, and venous blood gas parameters); Model 2 (Model 1 plus laboratory data); and Model 3 (Model 2 plus chest CT findings). Model performance was assessed using accuracy, sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and F1 score.</p><p><strong>Results: </strong>Among the 221 patients included, 69.2% were admitted to the ward, 9.0% to the intensive care unit (ICU), and 21.7% were discharged. For hospital admission prediction, Model 3 demonstrated the highest sensitivity (91.9%) and overall accuracy (76.5%), but the lowest specificity (20.8%). In contrast, for discharge prediction, Model 3 achieved the highest specificity (91.9%) but the lowest sensitivity (20.8%). Numerical improvements were observed across models, but none reached statistical significance (all p > 0.22). Model 1 therefore performed comparably to Models 2-3 while being less complex. Among patients who were discharged despite GPT-4o predicting admission, the 14-day ED re-presentation rates were 23.8% (5/21) for Model 1, 30.0% (9/30) for Model 2, and 28.9% (11/38) for Model 3.</p><p><strong>Conclusion: </strong>GPT-4o demonstrated high sensitivity in identifying ED patients requiring hospital admission, particularly those needing intensive care, when provided with progressively enriched clinical input. However, its low sensitivity for discharge prediction resulted in frequent overtriage, limiting its utility for autonomous decision-making. This proof-of-concept study demonstrates GPT-4o's capacity to stratify disposition decisions in complex respiratory cases under varying levels of limited input data. However, these findings should be interpreted in light of key limitations, including the selective high-acuity cohort and the absence of vital signs, and require prospective validation before clinical implementation.</p>\",\"PeriodicalId\":49292,\"journal\":{\"name\":\"Scandinavian Journal of Trauma Resuscitation & Emergency Medicine\",\"volume\":\"33 1\",\"pages\":\"159\"},\"PeriodicalIF\":3.1000,\"publicationDate\":\"2025-10-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12492850/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Scandinavian Journal of Trauma Resuscitation & Emergency Medicine\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1186/s13049-025-01475-3\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"EMERGENCY MEDICINE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Scandinavian Journal of Trauma Resuscitation & Emergency Medicine","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s13049-025-01475-3","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"EMERGENCY MEDICINE","Score":null,"Total":0}
引用次数: 0

摘要

背景:大型语言模型(LLMs),如gpt - 40,越来越多地被研究用于急诊医学的临床决策支持。然而,它们在性格预测中的实际表现仍然没有得到充分的研究。本研究评估了gpt - 40在预测急诊科配置(出院、住院或ICU入院)方面的诊断准确性,在复杂的紧急呼吸病例中,需要肺科会诊和胸部CT,代表了选择性的高急性急诊科患者亚组。方法:我们在2024年11月至2025年2月期间对一所高等急诊科进行了回顾性观察研究。我们回顾性地纳入了有复杂呼吸症状的ED患者,他们接受了肺科会诊和胸部CT检查,代表了一个选择性的高急性亚组,而不是普通的ED呼吸人群。gpt - 40使用三个渐进式丰富的输入模型提示预测最合适的ED处置:模型1(年龄、性别、血氧饱和度、家庭氧疗和静脉血气参数);模型2(模型1加实验室数据);模型3(模型2加胸部CT表现)。通过准确性、敏感性、特异性、阳性预测值(PPV)、阴性预测值(NPV)和F1评分来评估模型的性能。结果:221例患者住院率为69.2%,重症监护病房(ICU)住院率为9.0%,出院率为21.7%。对于住院预测,模型3具有最高的敏感性(91.9%)和总体准确性(76.5%),但特异性最低(20.8%)。相比之下,对于流量预测,模型3的特异性最高(91.9%),敏感性最低(20.8%)。各模型均观察到数值上的改善,但均未达到统计学意义(均p > 0.22)。因此,模型1的性能与模型2-3相当,但复杂性较低。在gpt - 40预测入院的出院患者中,模型1的14天ED再表现率为23.8%(5/21),模型2为30.0%(9/30),模型3为28.9%(11/38)。结论:当提供逐步丰富的临床输入时,gpt - 40在识别需要住院的ED患者,特别是需要重症监护的患者方面表现出高敏感性。然而,由于其对放电预测的敏感性较低,导致过度分类频繁,限制了其在自主决策中的应用。这项概念验证研究证明了gpt - 40在不同水平的有限输入数据下对复杂呼吸道病例进行分层处置决策的能力。然而,这些发现应该根据关键的局限性来解释,包括选择性高急性队列和缺乏生命体征,并且在临床实施之前需要前瞻性验证。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

Evaluating GPT-4o for emergency disposition of complex respiratory cases with pulmonology consultation: a diagnostic accuracy study.

Evaluating GPT-4o for emergency disposition of complex respiratory cases with pulmonology consultation: a diagnostic accuracy study.

Background: Large Language Models (LLMs), such as GPT-4o, are increasingly investigated for clinical decision support in emergency medicine. However, their real-world performance in disposition prediction remains insufficiently studied. This study evaluated the diagnostic accuracy of GPT-4o in predicting ED disposition-discharge, ward admission, or ICU admission-in complex emergency respiratory cases requiring pulmonology consultation and chest CT, representing a selective high-acuity subgroup of ED patients.

Methods: We conducted a retrospective observational study in a tertiary ED between November 2024 and February 2025. We retrospectively included ED patients with complex respiratory presentations who underwent pulmonology consultation and chest CT, representing a selective high-acuity subgroup rather than the general ED respiratory population. GPT-4o was prompted to predict the most appropriate ED disposition using three progressively enriched input models: Model 1 (age, sex, oxygen saturation, home oxygen therapy, and venous blood gas parameters); Model 2 (Model 1 plus laboratory data); and Model 3 (Model 2 plus chest CT findings). Model performance was assessed using accuracy, sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and F1 score.

Results: Among the 221 patients included, 69.2% were admitted to the ward, 9.0% to the intensive care unit (ICU), and 21.7% were discharged. For hospital admission prediction, Model 3 demonstrated the highest sensitivity (91.9%) and overall accuracy (76.5%), but the lowest specificity (20.8%). In contrast, for discharge prediction, Model 3 achieved the highest specificity (91.9%) but the lowest sensitivity (20.8%). Numerical improvements were observed across models, but none reached statistical significance (all p > 0.22). Model 1 therefore performed comparably to Models 2-3 while being less complex. Among patients who were discharged despite GPT-4o predicting admission, the 14-day ED re-presentation rates were 23.8% (5/21) for Model 1, 30.0% (9/30) for Model 2, and 28.9% (11/38) for Model 3.

Conclusion: GPT-4o demonstrated high sensitivity in identifying ED patients requiring hospital admission, particularly those needing intensive care, when provided with progressively enriched clinical input. However, its low sensitivity for discharge prediction resulted in frequent overtriage, limiting its utility for autonomous decision-making. This proof-of-concept study demonstrates GPT-4o's capacity to stratify disposition decisions in complex respiratory cases under varying levels of limited input data. However, these findings should be interpreted in light of key limitations, including the selective high-acuity cohort and the absence of vital signs, and require prospective validation before clinical implementation.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
6.10
自引率
6.10%
发文量
57
审稿时长
6-12 weeks
期刊介绍: The primary topics of interest in Scandinavian Journal of Trauma, Resuscitation and Emergency Medicine (SJTREM) are the pre-hospital and early in-hospital diagnostic and therapeutic aspects of emergency medicine, trauma, and resuscitation. Contributions focusing on dispatch, major incidents, etiology, pathophysiology, rehabilitation, epidemiology, prevention, education, training, implementation, work environment, as well as ethical and socio-economic aspects may also be assessed for publication.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信