chatgpt支持的急诊科语音分诊:一项前瞻性多中心研究

IF 2.7 3区 医学 Q1 EMERGENCY MEDICINE
Sinan Pasli , Metin Yadigaroğlu , Esma Nilay Kirimli , Muhammet Fatih Beşer , İhsan Unutmaz , Asu Özden Ayhan , Büşra Karakurt , Abdul Samet Şahin , Halil İbrahim Hiçyilmaz , Melih Imamoğlu
{"title":"chatgpt支持的急诊科语音分诊:一项前瞻性多中心研究","authors":"Sinan Pasli ,&nbsp;Metin Yadigaroğlu ,&nbsp;Esma Nilay Kirimli ,&nbsp;Muhammet Fatih Beşer ,&nbsp;İhsan Unutmaz ,&nbsp;Asu Özden Ayhan ,&nbsp;Büşra Karakurt ,&nbsp;Abdul Samet Şahin ,&nbsp;Halil İbrahim Hiçyilmaz ,&nbsp;Melih Imamoğlu","doi":"10.1016/j.ajem.2025.04.040","DOIUrl":null,"url":null,"abstract":"<div><h3>Background</h3><div>Triage aims to prioritize patients according to their medical urgency by accurately evaluating their clinical conditions, managing waiting times efficiently, and improving the overall effectiveness of emergency care. This study aims to assess ChatGPT's performance in patient triage across four emergency departments with varying dynamics and to provide a detailed analysis of its strengths and weaknesses.</div></div><div><h3>Methods</h3><div>In this multicenter, prospective study, we compared the triage decisions made by ChatGPT-4o and the triage personnel with the gold standard decisions determined by an emergency medicine (EM) specialist. In the hospitals where we conducted the study, triage teams routinely direct patients to the appropriate ED areas based on the Emergency Severity Index (ESI) system and the hospital's local triage protocols. During the study period, the triage team collected patient data, including chief complaints, comorbidities, and vital signs, and used this information to make the initial triage decisions. An independent physician simultaneously entered the same data into ChatGPT using voice commands. At the same time, an EM specialist, present in the triage room throughout the study period, reviewed the same patient data and determined the gold standard triage decisions, strictly adhering to both the hospital's local protocols and the ESI system. Before initiating the study, we customized ChatGPT for each hospital by designing prompts that incorporated both the general principles of the ESI triage system and the specific triage rules of each hospital. The model's overall, hospital-based, and area-based performance was evaluated, with Cohen's Kappa, F1 score, and performance analyses conducted.</div></div><div><h3>Results</h3><div>This study included 6657 patients. The overall agreement between triage personnel and GPT-4o with the gold standard was nearly perfect (Cohen's kappa = 0.782 and 0.833, respectively). The overall F1 score was 0.863 for the triage team, while GPT-4 achieved an F1 score of 0.897, demonstrating superior performance. ROC curve analysis showed the lowest performance in the yellow zone of a tertiary hospital (AUC = 0.75) and in the red zone of another tertiary hospital (AUC = 0.78). However, overall, AUC values greater than 0.90 were observed, indicating high accuracy.</div></div><div><h3>Conclusion</h3><div>ChatGPT generally outperformed triage personnel in patient triage across emergency departments with varying conditions, demonstrating high agreement with the gold standard decision. However, in tertiary hospitals, its performance was relatively lower in triaging patients with more complex symptoms, particularly those requiring triage to the yellow and red zones.</div></div>","PeriodicalId":55536,"journal":{"name":"American Journal of Emergency Medicine","volume":"94 ","pages":"Pages 63-70"},"PeriodicalIF":2.7000,"publicationDate":"2025-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"ChatGPT-supported patient triage with voice commands in the emergency department: A prospective multicenter study\",\"authors\":\"Sinan Pasli ,&nbsp;Metin Yadigaroğlu ,&nbsp;Esma Nilay Kirimli ,&nbsp;Muhammet Fatih Beşer ,&nbsp;İhsan Unutmaz ,&nbsp;Asu Özden Ayhan ,&nbsp;Büşra Karakurt ,&nbsp;Abdul Samet Şahin ,&nbsp;Halil İbrahim Hiçyilmaz ,&nbsp;Melih Imamoğlu\",\"doi\":\"10.1016/j.ajem.2025.04.040\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><h3>Background</h3><div>Triage aims to prioritize patients according to their medical urgency by accurately evaluating their clinical conditions, managing waiting times efficiently, and improving the overall effectiveness of emergency care. This study aims to assess ChatGPT's performance in patient triage across four emergency departments with varying dynamics and to provide a detailed analysis of its strengths and weaknesses.</div></div><div><h3>Methods</h3><div>In this multicenter, prospective study, we compared the triage decisions made by ChatGPT-4o and the triage personnel with the gold standard decisions determined by an emergency medicine (EM) specialist. In the hospitals where we conducted the study, triage teams routinely direct patients to the appropriate ED areas based on the Emergency Severity Index (ESI) system and the hospital's local triage protocols. During the study period, the triage team collected patient data, including chief complaints, comorbidities, and vital signs, and used this information to make the initial triage decisions. An independent physician simultaneously entered the same data into ChatGPT using voice commands. At the same time, an EM specialist, present in the triage room throughout the study period, reviewed the same patient data and determined the gold standard triage decisions, strictly adhering to both the hospital's local protocols and the ESI system. Before initiating the study, we customized ChatGPT for each hospital by designing prompts that incorporated both the general principles of the ESI triage system and the specific triage rules of each hospital. The model's overall, hospital-based, and area-based performance was evaluated, with Cohen's Kappa, F1 score, and performance analyses conducted.</div></div><div><h3>Results</h3><div>This study included 6657 patients. The overall agreement between triage personnel and GPT-4o with the gold standard was nearly perfect (Cohen's kappa = 0.782 and 0.833, respectively). The overall F1 score was 0.863 for the triage team, while GPT-4 achieved an F1 score of 0.897, demonstrating superior performance. ROC curve analysis showed the lowest performance in the yellow zone of a tertiary hospital (AUC = 0.75) and in the red zone of another tertiary hospital (AUC = 0.78). However, overall, AUC values greater than 0.90 were observed, indicating high accuracy.</div></div><div><h3>Conclusion</h3><div>ChatGPT generally outperformed triage personnel in patient triage across emergency departments with varying conditions, demonstrating high agreement with the gold standard decision. However, in tertiary hospitals, its performance was relatively lower in triaging patients with more complex symptoms, particularly those requiring triage to the yellow and red zones.</div></div>\",\"PeriodicalId\":55536,\"journal\":{\"name\":\"American Journal of Emergency Medicine\",\"volume\":\"94 \",\"pages\":\"Pages 63-70\"},\"PeriodicalIF\":2.7000,\"publicationDate\":\"2025-04-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"American Journal of Emergency Medicine\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0735675725002803\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"EMERGENCY MEDICINE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"American Journal of Emergency Medicine","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0735675725002803","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"EMERGENCY MEDICINE","Score":null,"Total":0}
引用次数: 0

摘要

triage旨在通过准确评估患者的临床状况,有效管理等待时间,提高急诊护理的整体效果,从而根据患者的医疗紧急程度对患者进行优先排序。本研究旨在评估ChatGPT在四个不同动态急诊科的患者分诊中的表现,并详细分析其优势和劣势。方法在这项多中心前瞻性研究中,我们将chatgpt - 40和分诊人员做出的分诊决定与急诊医学(EM)专家确定的金标准决定进行比较。在我们进行研究的医院中,分诊小组通常根据紧急程度指数(ESI)系统和医院当地的分诊协议将患者引导到适当的急诊科区域。在研究期间,分诊小组收集了患者数据,包括主诉、合并症和生命体征,并利用这些信息做出初步的分诊决定。一位独立的医生同时使用语音命令将相同的数据输入ChatGPT。同时,一名EM专家在整个研究期间都在分诊室,审查了相同的患者数据,并确定了黄金标准的分诊决定,严格遵守医院的当地协议和ESI系统。在开始研究之前,我们为每家医院定制了ChatGPT,设计了包含ESI分诊系统一般原则和每家医院具体分诊规则的提示。该模型的整体、基于医院和基于地区的性能进行了评估,并进行了科恩Kappa、F1评分和性能分析。结果本研究纳入6657例患者。分诊人员和gpt - 40与黄金标准之间的总体一致性几乎是完美的(Cohen的kappa分别= 0.782和0.833)。分诊组F1总分为0.863,GPT-4的F1总分为0.897,表现优异。ROC曲线分析显示,某三级医院在黄色区域(AUC = 0.75)和另一家三级医院在红色区域(AUC = 0.78)的表现最低。但总体而言,AUC值大于0.90,表明准确度较高。结论chatgpt在不同情况的急诊科患者分诊中总体优于分诊人员,与金标准决策具有较高的一致性。然而,在三级医院,它在对症状较复杂的患者进行分诊方面的表现相对较低,特别是那些需要分诊到黄色和红色区域的患者。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
ChatGPT-supported patient triage with voice commands in the emergency department: A prospective multicenter study

Background

Triage aims to prioritize patients according to their medical urgency by accurately evaluating their clinical conditions, managing waiting times efficiently, and improving the overall effectiveness of emergency care. This study aims to assess ChatGPT's performance in patient triage across four emergency departments with varying dynamics and to provide a detailed analysis of its strengths and weaknesses.

Methods

In this multicenter, prospective study, we compared the triage decisions made by ChatGPT-4o and the triage personnel with the gold standard decisions determined by an emergency medicine (EM) specialist. In the hospitals where we conducted the study, triage teams routinely direct patients to the appropriate ED areas based on the Emergency Severity Index (ESI) system and the hospital's local triage protocols. During the study period, the triage team collected patient data, including chief complaints, comorbidities, and vital signs, and used this information to make the initial triage decisions. An independent physician simultaneously entered the same data into ChatGPT using voice commands. At the same time, an EM specialist, present in the triage room throughout the study period, reviewed the same patient data and determined the gold standard triage decisions, strictly adhering to both the hospital's local protocols and the ESI system. Before initiating the study, we customized ChatGPT for each hospital by designing prompts that incorporated both the general principles of the ESI triage system and the specific triage rules of each hospital. The model's overall, hospital-based, and area-based performance was evaluated, with Cohen's Kappa, F1 score, and performance analyses conducted.

Results

This study included 6657 patients. The overall agreement between triage personnel and GPT-4o with the gold standard was nearly perfect (Cohen's kappa = 0.782 and 0.833, respectively). The overall F1 score was 0.863 for the triage team, while GPT-4 achieved an F1 score of 0.897, demonstrating superior performance. ROC curve analysis showed the lowest performance in the yellow zone of a tertiary hospital (AUC = 0.75) and in the red zone of another tertiary hospital (AUC = 0.78). However, overall, AUC values greater than 0.90 were observed, indicating high accuracy.

Conclusion

ChatGPT generally outperformed triage personnel in patient triage across emergency departments with varying conditions, demonstrating high agreement with the gold standard decision. However, in tertiary hospitals, its performance was relatively lower in triaging patients with more complex symptoms, particularly those requiring triage to the yellow and red zones.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
6.00
自引率
5.60%
发文量
730
审稿时长
42 days
期刊介绍: A distinctive blend of practicality and scholarliness makes the American Journal of Emergency Medicine a key source for information on emergency medical care. Covering all activities concerned with emergency medicine, it is the journal to turn to for information to help increase the ability to understand, recognize and treat emergency conditions. Issues contain clinical articles, case reports, review articles, editorials, international notes, book reviews and more.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信