Sinan Pasli , Metin Yadigaroğlu , Esma Nilay Kirimli , Muhammet Fatih Beşer , İhsan Unutmaz , Asu Özden Ayhan , Büşra Karakurt , Abdul Samet Şahin , Halil İbrahim Hiçyilmaz , Melih Imamoğlu
{"title":"chatgpt支持的急诊科语音分诊:一项前瞻性多中心研究","authors":"Sinan Pasli , Metin Yadigaroğlu , Esma Nilay Kirimli , Muhammet Fatih Beşer , İhsan Unutmaz , Asu Özden Ayhan , Büşra Karakurt , Abdul Samet Şahin , Halil İbrahim Hiçyilmaz , Melih Imamoğlu","doi":"10.1016/j.ajem.2025.04.040","DOIUrl":null,"url":null,"abstract":"<div><h3>Background</h3><div>Triage aims to prioritize patients according to their medical urgency by accurately evaluating their clinical conditions, managing waiting times efficiently, and improving the overall effectiveness of emergency care. This study aims to assess ChatGPT's performance in patient triage across four emergency departments with varying dynamics and to provide a detailed analysis of its strengths and weaknesses.</div></div><div><h3>Methods</h3><div>In this multicenter, prospective study, we compared the triage decisions made by ChatGPT-4o and the triage personnel with the gold standard decisions determined by an emergency medicine (EM) specialist. In the hospitals where we conducted the study, triage teams routinely direct patients to the appropriate ED areas based on the Emergency Severity Index (ESI) system and the hospital's local triage protocols. During the study period, the triage team collected patient data, including chief complaints, comorbidities, and vital signs, and used this information to make the initial triage decisions. An independent physician simultaneously entered the same data into ChatGPT using voice commands. At the same time, an EM specialist, present in the triage room throughout the study period, reviewed the same patient data and determined the gold standard triage decisions, strictly adhering to both the hospital's local protocols and the ESI system. Before initiating the study, we customized ChatGPT for each hospital by designing prompts that incorporated both the general principles of the ESI triage system and the specific triage rules of each hospital. The model's overall, hospital-based, and area-based performance was evaluated, with Cohen's Kappa, F1 score, and performance analyses conducted.</div></div><div><h3>Results</h3><div>This study included 6657 patients. The overall agreement between triage personnel and GPT-4o with the gold standard was nearly perfect (Cohen's kappa = 0.782 and 0.833, respectively). The overall F1 score was 0.863 for the triage team, while GPT-4 achieved an F1 score of 0.897, demonstrating superior performance. ROC curve analysis showed the lowest performance in the yellow zone of a tertiary hospital (AUC = 0.75) and in the red zone of another tertiary hospital (AUC = 0.78). However, overall, AUC values greater than 0.90 were observed, indicating high accuracy.</div></div><div><h3>Conclusion</h3><div>ChatGPT generally outperformed triage personnel in patient triage across emergency departments with varying conditions, demonstrating high agreement with the gold standard decision. However, in tertiary hospitals, its performance was relatively lower in triaging patients with more complex symptoms, particularly those requiring triage to the yellow and red zones.</div></div>","PeriodicalId":55536,"journal":{"name":"American Journal of Emergency Medicine","volume":"94 ","pages":"Pages 63-70"},"PeriodicalIF":2.7000,"publicationDate":"2025-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"ChatGPT-supported patient triage with voice commands in the emergency department: A prospective multicenter study\",\"authors\":\"Sinan Pasli , Metin Yadigaroğlu , Esma Nilay Kirimli , Muhammet Fatih Beşer , İhsan Unutmaz , Asu Özden Ayhan , Büşra Karakurt , Abdul Samet Şahin , Halil İbrahim Hiçyilmaz , Melih Imamoğlu\",\"doi\":\"10.1016/j.ajem.2025.04.040\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><h3>Background</h3><div>Triage aims to prioritize patients according to their medical urgency by accurately evaluating their clinical conditions, managing waiting times efficiently, and improving the overall effectiveness of emergency care. This study aims to assess ChatGPT's performance in patient triage across four emergency departments with varying dynamics and to provide a detailed analysis of its strengths and weaknesses.</div></div><div><h3>Methods</h3><div>In this multicenter, prospective study, we compared the triage decisions made by ChatGPT-4o and the triage personnel with the gold standard decisions determined by an emergency medicine (EM) specialist. In the hospitals where we conducted the study, triage teams routinely direct patients to the appropriate ED areas based on the Emergency Severity Index (ESI) system and the hospital's local triage protocols. During the study period, the triage team collected patient data, including chief complaints, comorbidities, and vital signs, and used this information to make the initial triage decisions. An independent physician simultaneously entered the same data into ChatGPT using voice commands. At the same time, an EM specialist, present in the triage room throughout the study period, reviewed the same patient data and determined the gold standard triage decisions, strictly adhering to both the hospital's local protocols and the ESI system. Before initiating the study, we customized ChatGPT for each hospital by designing prompts that incorporated both the general principles of the ESI triage system and the specific triage rules of each hospital. The model's overall, hospital-based, and area-based performance was evaluated, with Cohen's Kappa, F1 score, and performance analyses conducted.</div></div><div><h3>Results</h3><div>This study included 6657 patients. The overall agreement between triage personnel and GPT-4o with the gold standard was nearly perfect (Cohen's kappa = 0.782 and 0.833, respectively). The overall F1 score was 0.863 for the triage team, while GPT-4 achieved an F1 score of 0.897, demonstrating superior performance. ROC curve analysis showed the lowest performance in the yellow zone of a tertiary hospital (AUC = 0.75) and in the red zone of another tertiary hospital (AUC = 0.78). However, overall, AUC values greater than 0.90 were observed, indicating high accuracy.</div></div><div><h3>Conclusion</h3><div>ChatGPT generally outperformed triage personnel in patient triage across emergency departments with varying conditions, demonstrating high agreement with the gold standard decision. However, in tertiary hospitals, its performance was relatively lower in triaging patients with more complex symptoms, particularly those requiring triage to the yellow and red zones.</div></div>\",\"PeriodicalId\":55536,\"journal\":{\"name\":\"American Journal of Emergency Medicine\",\"volume\":\"94 \",\"pages\":\"Pages 63-70\"},\"PeriodicalIF\":2.7000,\"publicationDate\":\"2025-04-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"American Journal of Emergency Medicine\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0735675725002803\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"EMERGENCY MEDICINE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"American Journal of Emergency Medicine","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0735675725002803","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"EMERGENCY MEDICINE","Score":null,"Total":0}
ChatGPT-supported patient triage with voice commands in the emergency department: A prospective multicenter study
Background
Triage aims to prioritize patients according to their medical urgency by accurately evaluating their clinical conditions, managing waiting times efficiently, and improving the overall effectiveness of emergency care. This study aims to assess ChatGPT's performance in patient triage across four emergency departments with varying dynamics and to provide a detailed analysis of its strengths and weaknesses.
Methods
In this multicenter, prospective study, we compared the triage decisions made by ChatGPT-4o and the triage personnel with the gold standard decisions determined by an emergency medicine (EM) specialist. In the hospitals where we conducted the study, triage teams routinely direct patients to the appropriate ED areas based on the Emergency Severity Index (ESI) system and the hospital's local triage protocols. During the study period, the triage team collected patient data, including chief complaints, comorbidities, and vital signs, and used this information to make the initial triage decisions. An independent physician simultaneously entered the same data into ChatGPT using voice commands. At the same time, an EM specialist, present in the triage room throughout the study period, reviewed the same patient data and determined the gold standard triage decisions, strictly adhering to both the hospital's local protocols and the ESI system. Before initiating the study, we customized ChatGPT for each hospital by designing prompts that incorporated both the general principles of the ESI triage system and the specific triage rules of each hospital. The model's overall, hospital-based, and area-based performance was evaluated, with Cohen's Kappa, F1 score, and performance analyses conducted.
Results
This study included 6657 patients. The overall agreement between triage personnel and GPT-4o with the gold standard was nearly perfect (Cohen's kappa = 0.782 and 0.833, respectively). The overall F1 score was 0.863 for the triage team, while GPT-4 achieved an F1 score of 0.897, demonstrating superior performance. ROC curve analysis showed the lowest performance in the yellow zone of a tertiary hospital (AUC = 0.75) and in the red zone of another tertiary hospital (AUC = 0.78). However, overall, AUC values greater than 0.90 were observed, indicating high accuracy.
Conclusion
ChatGPT generally outperformed triage personnel in patient triage across emergency departments with varying conditions, demonstrating high agreement with the gold standard decision. However, in tertiary hospitals, its performance was relatively lower in triaging patients with more complex symptoms, particularly those requiring triage to the yellow and red zones.
期刊介绍:
A distinctive blend of practicality and scholarliness makes the American Journal of Emergency Medicine a key source for information on emergency medical care. Covering all activities concerned with emergency medicine, it is the journal to turn to for information to help increase the ability to understand, recognize and treat emergency conditions. Issues contain clinical articles, case reports, review articles, editorials, international notes, book reviews and more.