Can AI match emergency physicians in managing common emergency cases? A comparative performance evaluation.

IF 2.3 3区 医学 Q1 EMERGENCY MEDICINE
Mehmet Gün
{"title":"Can AI match emergency physicians in managing common emergency cases? A comparative performance evaluation.","authors":"Mehmet Gün","doi":"10.1186/s12873-025-01303-y","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Large language models (LLMs) such as ChatGPT are increasingly explored for clinical decision support. However, their performance in high-stakes emergency scenarios remains underexamined. This study aimed to evaluate ChatGPT's diagnostic and therapeutic accuracy compared to a board-certified emergency physician across diverse emergency cases.</p><p><strong>Methods: </strong>This comparative study was conducted using 15 standardized emergency scenarios sourced from validated academic platforms (Geeky Medics, Life in the Fast Lane, Emergency Medicine Cases). ChatGPT (GPT-4) and a physician independently evaluated each case based on five predefined parameters: diagnosis, investigations, initial treatment, clinical safety, and decision-making complexity. Cases were scored out of 5. Concordance was categorized as high (5/5), moderate (4/5), or low (≤ 3/5). Wilson confidence intervals (95%) were calculated for each concordance category.</p><p><strong>Results: </strong>ChatGPT achieved high concordance (5/5) in 8 cases (53.3%, 95% CI: 27.6-77.0%), moderate concordance (4/5) in 4 cases (26.7%, CI: 10.3-55.4%), and low concordance (≤ 3/5) in 3 cases (20.0%, CI: 6.0-45.6%). Performance was strongest in structured, protocol-based conditions such as STEMI, DKA, and asthma. Lower performance was observed in complex scenarios like stroke, trauma with shock, and mixed acid-base disturbances.</p><p><strong>Conclusion: </strong>ChatGPT showed strong alignment with emergency physician decisions in structured scenarios but lacked reliability in complex cases. While AI may enhance decision-making and education, it cannot replace the clinical reasoning of human physicians. Its role is best framed as a supportive tool rather than a substitute.</p>","PeriodicalId":9002,"journal":{"name":"BMC Emergency Medicine","volume":"25 1","pages":"142"},"PeriodicalIF":2.3000,"publicationDate":"2025-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12315197/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Emergency Medicine","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s12873-025-01303-y","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"EMERGENCY MEDICINE","Score":null,"Total":0}
引用次数: 0

Abstract

Background: Large language models (LLMs) such as ChatGPT are increasingly explored for clinical decision support. However, their performance in high-stakes emergency scenarios remains underexamined. This study aimed to evaluate ChatGPT's diagnostic and therapeutic accuracy compared to a board-certified emergency physician across diverse emergency cases.

Methods: This comparative study was conducted using 15 standardized emergency scenarios sourced from validated academic platforms (Geeky Medics, Life in the Fast Lane, Emergency Medicine Cases). ChatGPT (GPT-4) and a physician independently evaluated each case based on five predefined parameters: diagnosis, investigations, initial treatment, clinical safety, and decision-making complexity. Cases were scored out of 5. Concordance was categorized as high (5/5), moderate (4/5), or low (≤ 3/5). Wilson confidence intervals (95%) were calculated for each concordance category.

Results: ChatGPT achieved high concordance (5/5) in 8 cases (53.3%, 95% CI: 27.6-77.0%), moderate concordance (4/5) in 4 cases (26.7%, CI: 10.3-55.4%), and low concordance (≤ 3/5) in 3 cases (20.0%, CI: 6.0-45.6%). Performance was strongest in structured, protocol-based conditions such as STEMI, DKA, and asthma. Lower performance was observed in complex scenarios like stroke, trauma with shock, and mixed acid-base disturbances.

Conclusion: ChatGPT showed strong alignment with emergency physician decisions in structured scenarios but lacked reliability in complex cases. While AI may enhance decision-making and education, it cannot replace the clinical reasoning of human physicians. Its role is best framed as a supportive tool rather than a substitute.

人工智能在处理普通急诊病例方面能否与急诊医生匹敌?比较绩效评估。
背景:像ChatGPT这样的大型语言模型(llm)越来越多地被用于临床决策支持。然而,它们在高风险紧急情况下的表现仍未得到充分研究。本研究旨在评估ChatGPT在不同急诊病例中的诊断和治疗准确性,并与委员会认证的急诊医生进行比较。方法:本研究采用15个标准化的急诊案例进行对比研究,这些案例均来自经过验证的学术平台(极客医生、快车道生活、急诊医学案例)。ChatGPT (GPT-4)和医生根据五个预定义参数独立评估每个病例:诊断、调查、初始治疗、临床安全性和决策复杂性。案例的评分为5分。一致性分为高(5/5)、中(4/5)和低(≤3/5)。计算每个一致性类别的威尔逊置信区间(95%)。结果:ChatGPT达到高一致性(5/5)8例(53.3%,95% CI: 27.6 ~ 77.0%),中度一致性(4/5)4例(26.7%,CI: 10.3 ~ 55.4%),低一致性(≤3/5)3例(20.0%,CI: 6.0 ~ 45.6%)。在结构化的、基于协议的条件下,如STEMI、DKA和哮喘,表现最好。在复杂的情况下,如中风、创伤合并休克和混合酸碱干扰,表现较差。结论:ChatGPT在结构化情况下与急诊医生的决定有很强的一致性,但在复杂情况下缺乏可靠性。虽然人工智能可以增强决策和教育,但它无法取代人类医生的临床推理。它的作用最好是作为一种支持工具,而不是替代品。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
BMC Emergency Medicine
BMC Emergency Medicine Medicine-Emergency Medicine
CiteScore
3.50
自引率
8.00%
发文量
178
审稿时长
29 weeks
期刊介绍: BMC Emergency Medicine is an open access, peer-reviewed journal that considers articles on all urgent and emergency aspects of medicine, in both practice and basic research. In addition, the journal covers aspects of disaster medicine and medicine in special locations, such as conflict areas and military medicine, together with articles concerning healthcare services in the emergency departments.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信