IF 1.2 4区 医学 Q3 EMERGENCY MEDICINE
Pediatric emergency care Pub Date : 2025-04-01 Epub Date: 2025-01-07 DOI:10.1097/PEC.0000000000003315
Brandon Ho, Meng Lu, Xuan Wang, Russell Butler, Joshua Park, Dennis Ren
{"title":"Evaluation of Generative Artificial Intelligence Models in Predicting Pediatric Emergency Severity Index Levels.","authors":"Brandon Ho, Meng Lu, Xuan Wang, Russell Butler, Joshua Park, Dennis Ren","doi":"10.1097/PEC.0000000000003315","DOIUrl":null,"url":null,"abstract":"<p><strong>Objective: </strong>Evaluate the accuracy and reliability of various generative artificial intelligence (AI) models (ChatGPT-3.5, ChatGPT-4.0, T5, Llama-2, Mistral-Large, and Claude-3 Opus) in predicting Emergency Severity Index (ESI) levels for pediatric emergency department patients and assess the impact of medically oriented fine-tuning.</p><p><strong>Methods: </strong>Seventy pediatric clinical vignettes from the ESI Handbook version 4 were used as the gold standard. Each AI model predicted the ESI level for each vignette. Performance metrics, including sensitivity, specificity, and F1 score, were calculated. Reliability was assessed by repeating the tests and measuring the interrater reliability using Fleiss kappa. Paired t tests were used to compare the models before and after fine-tuning.</p><p><strong>Results: </strong>Claude-3 Opus achieved the highest performance amongst the untrained models with a sensitivity of 80.6% (95% confidence interval [CI]: 63.6-90.7), specificity of 91.3% (95% CI: 83.8-99), and an F1 score of 73.9% (95% CI: 58.9-90.7). After fine-tuning, the GPT-4.0 model showed statistically significant improvement with a sensitivity of 77.1% (95% CI: 60.1-86.5), specificity of 92.5% (95% CI: 89.5-97.4), and an F1 score of 74.6% (95% CI: 63.9-83.8, P  < 0.04). Reliability analysis revealed high agreement for Claude-3 Opus (Fleiss κ: 0.85), followed by Mistral-Large (Fleiss κ: 0.79) and trained GPT-4.0 (Fleiss κ: 0.67). Training improved the reliability of GPT models ( P  < 0.001).</p><p><strong>Conclusions: </strong>Generative AI models demonstrate promising accuracy in predicting pediatric ESI levels, with fine-tuning significantly enhancing their performance and reliability. These findings suggest that AI could serve as a valuable tool in pediatric triage.</p>","PeriodicalId":19996,"journal":{"name":"Pediatric emergency care","volume":"41 4","pages":"251-255"},"PeriodicalIF":1.2000,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pediatric emergency care","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1097/PEC.0000000000003315","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/7 0:00:00","PubModel":"Epub","JCR":"Q3","JCRName":"EMERGENCY MEDICINE","Score":null,"Total":0}
引用次数: 0

摘要

目标评估各种生成式人工智能(AI)模型(ChatGPT-3.5、ChatGPT-4.0、T5、Llama-2、Mistral-Large 和 Claude-3 Opus)预测儿科急诊患者急诊严重程度指数(ESI)级别的准确性和可靠性,并评估以医学为导向的微调的影响:方法:将《ESI 手册》第 4 版中的 70 个儿科临床案例作为金标准。每个人工智能模型预测每个小节的ESI水平。计算灵敏度、特异性和 F1 分数等性能指标。通过重复测试和使用弗莱斯卡帕法测量研究者之间的可靠性来评估可靠性。使用配对 t 检验比较微调前后的模型:在未经训练的模型中,Claude-3 Opus 的性能最高,灵敏度为 80.6%(95% 置信区间 [CI]:63.6-90.7),特异度为 91.3%(95% 置信区间 [CI]:83.8-99),F1 得分为 73.9%(95% 置信区间 [CI]:58.9-90.7)。经过微调后,GPT-4.0 模型的灵敏度为 77.1%(95% CI:60.1-86.5),特异度为 92.5%(95% CI:89.5-97.4),F1 得分为 74.6%(95% CI:63.9-83.8,P),在统计上有了显著提高:生成式人工智能模型在预测儿科 ESI 水平方面表现出良好的准确性,微调可显著提高其性能和可靠性。这些研究结果表明,人工智能可作为儿科分诊的重要工具。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Evaluation of Generative Artificial Intelligence Models in Predicting Pediatric Emergency Severity Index Levels.

Objective: Evaluate the accuracy and reliability of various generative artificial intelligence (AI) models (ChatGPT-3.5, ChatGPT-4.0, T5, Llama-2, Mistral-Large, and Claude-3 Opus) in predicting Emergency Severity Index (ESI) levels for pediatric emergency department patients and assess the impact of medically oriented fine-tuning.

Methods: Seventy pediatric clinical vignettes from the ESI Handbook version 4 were used as the gold standard. Each AI model predicted the ESI level for each vignette. Performance metrics, including sensitivity, specificity, and F1 score, were calculated. Reliability was assessed by repeating the tests and measuring the interrater reliability using Fleiss kappa. Paired t tests were used to compare the models before and after fine-tuning.

Results: Claude-3 Opus achieved the highest performance amongst the untrained models with a sensitivity of 80.6% (95% confidence interval [CI]: 63.6-90.7), specificity of 91.3% (95% CI: 83.8-99), and an F1 score of 73.9% (95% CI: 58.9-90.7). After fine-tuning, the GPT-4.0 model showed statistically significant improvement with a sensitivity of 77.1% (95% CI: 60.1-86.5), specificity of 92.5% (95% CI: 89.5-97.4), and an F1 score of 74.6% (95% CI: 63.9-83.8, P  < 0.04). Reliability analysis revealed high agreement for Claude-3 Opus (Fleiss κ: 0.85), followed by Mistral-Large (Fleiss κ: 0.79) and trained GPT-4.0 (Fleiss κ: 0.67). Training improved the reliability of GPT models ( P  < 0.001).

Conclusions: Generative AI models demonstrate promising accuracy in predicting pediatric ESI levels, with fine-tuning significantly enhancing their performance and reliability. These findings suggest that AI could serve as a valuable tool in pediatric triage.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Pediatric emergency care
Pediatric emergency care 医学-急救医学
CiteScore
2.40
自引率
14.30%
发文量
577
审稿时长
3-6 weeks
期刊介绍: Pediatric Emergency Care®, features clinically relevant original articles with an EM perspective on the care of acutely ill or injured children and adolescents. The journal is aimed at both the pediatrician who wants to know more about treating and being compensated for minor emergency cases and the emergency physicians who must treat children or adolescents in more than one case in there.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信