比较ChatGPT-4和儿科重症监护专家在回答医学教育问题:一项多中心评估。

IF 1.6 4区 医学 Q2 PEDIATRICS
Shai Yitzhaki, Nadav Peled, Eytan Kaplan, Gili Kadmon, Elhanan Nahum, Yulia Gendler, Avichai Weissbach
{"title":"比较ChatGPT-4和儿科重症监护专家在回答医学教育问题:一项多中心评估。","authors":"Shai Yitzhaki,&nbsp;Nadav Peled,&nbsp;Eytan Kaplan,&nbsp;Gili Kadmon,&nbsp;Elhanan Nahum,&nbsp;Yulia Gendler,&nbsp;Avichai Weissbach","doi":"10.1111/jpc.70080","DOIUrl":null,"url":null,"abstract":"<div>\n \n \n <section>\n \n <h3> Objective</h3>\n \n <p>To compare the performance of the Generative Pre-trained Transformer model 4 (ChatGPT-4) with that of a paediatric intensive care unit (PICU) specialist in responding to open-ended medical education questions.</p>\n </section>\n \n <section>\n \n <h3> Methods</h3>\n \n <p>A comparative analysis was conducted using 100 educational questions sourced from a PICU trainee WhatsApp forum, covering factual knowledge and clinical reasoning. Ten PICU specialists from multiple tertiary paediatric centres independently evaluated 20 sets of paired responses from ChatGPT-4 and a PICU specialist (the original respondent to the forum questions), assessing overall superiority, completeness, accuracy, and integration potential.</p>\n </section>\n \n <section>\n \n <h3> Results</h3>\n \n <p>After excluding one question requiring a visual aid, 198 paired evaluations were made (96 factual knowledge and 102 clinical reasoning). ChatGPT-4's responses were significantly longer than those of the PICU specialist (median words: 189 vs. 41; <i>p</i> &lt; 0.0001). ChatGPT-4 was preferred in 60% of factual knowledge comparisons (<i>p</i> &lt; 0.001), while the PICU specialist's responses were preferred in 67% of clinical reasoning comparisons (<i>p</i> &lt; 0.0001). ChatGPT-4 demonstrated superior completeness in factual knowledge (<i>p</i> = 0.02) but lower accuracy in clinical reasoning (<i>p</i> &lt; 0.0001). Integration of both answers was favoured in 37% of cases (95% CI, 31%–44%).</p>\n </section>\n \n <section>\n \n <h3> Conclusions</h3>\n \n <p>ChatGPT-4 shows promise as a tool for factual medical education in the PICU, excelling in completeness. However, it requires oversight in clinical reasoning tasks, where the PICU specialist's responses remain superior. Expert review is essential before using ChatGPT-4 independently in PICU education and in other similarly underexplored medical fields.</p>\n </section>\n </div>","PeriodicalId":16648,"journal":{"name":"Journal of paediatrics and child health","volume":"61 7","pages":"1084-1089"},"PeriodicalIF":1.6000,"publicationDate":"2025-05-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Comparing ChatGPT-4 and a Paediatric Intensive Care Specialist in Responding to Medical Education Questions: A Multicenter Evaluation\",\"authors\":\"Shai Yitzhaki,&nbsp;Nadav Peled,&nbsp;Eytan Kaplan,&nbsp;Gili Kadmon,&nbsp;Elhanan Nahum,&nbsp;Yulia Gendler,&nbsp;Avichai Weissbach\",\"doi\":\"10.1111/jpc.70080\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div>\\n \\n \\n <section>\\n \\n <h3> Objective</h3>\\n \\n <p>To compare the performance of the Generative Pre-trained Transformer model 4 (ChatGPT-4) with that of a paediatric intensive care unit (PICU) specialist in responding to open-ended medical education questions.</p>\\n </section>\\n \\n <section>\\n \\n <h3> Methods</h3>\\n \\n <p>A comparative analysis was conducted using 100 educational questions sourced from a PICU trainee WhatsApp forum, covering factual knowledge and clinical reasoning. Ten PICU specialists from multiple tertiary paediatric centres independently evaluated 20 sets of paired responses from ChatGPT-4 and a PICU specialist (the original respondent to the forum questions), assessing overall superiority, completeness, accuracy, and integration potential.</p>\\n </section>\\n \\n <section>\\n \\n <h3> Results</h3>\\n \\n <p>After excluding one question requiring a visual aid, 198 paired evaluations were made (96 factual knowledge and 102 clinical reasoning). ChatGPT-4's responses were significantly longer than those of the PICU specialist (median words: 189 vs. 41; <i>p</i> &lt; 0.0001). ChatGPT-4 was preferred in 60% of factual knowledge comparisons (<i>p</i> &lt; 0.001), while the PICU specialist's responses were preferred in 67% of clinical reasoning comparisons (<i>p</i> &lt; 0.0001). ChatGPT-4 demonstrated superior completeness in factual knowledge (<i>p</i> = 0.02) but lower accuracy in clinical reasoning (<i>p</i> &lt; 0.0001). Integration of both answers was favoured in 37% of cases (95% CI, 31%–44%).</p>\\n </section>\\n \\n <section>\\n \\n <h3> Conclusions</h3>\\n \\n <p>ChatGPT-4 shows promise as a tool for factual medical education in the PICU, excelling in completeness. However, it requires oversight in clinical reasoning tasks, where the PICU specialist's responses remain superior. Expert review is essential before using ChatGPT-4 independently in PICU education and in other similarly underexplored medical fields.</p>\\n </section>\\n </div>\",\"PeriodicalId\":16648,\"journal\":{\"name\":\"Journal of paediatrics and child health\",\"volume\":\"61 7\",\"pages\":\"1084-1089\"},\"PeriodicalIF\":1.6000,\"publicationDate\":\"2025-05-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of paediatrics and child health\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1111/jpc.70080\",\"RegionNum\":4,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"PEDIATRICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of paediatrics and child health","FirstCategoryId":"3","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1111/jpc.70080","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"PEDIATRICS","Score":null,"Total":0}
引用次数: 0

摘要

目的:比较生成式预训练变形模型4 (ChatGPT-4)与儿科重症监护病房(PICU)专家在回答开放式医学教育问题方面的表现。方法:对来自PICU培训生WhatsApp论坛的100个教育性问题进行比较分析,涵盖事实知识和临床推理。来自多个三级儿科中心的10名PICU专家独立评估了来自ChatGPT-4和一名PICU专家(论坛问题的原始回答者)的20组配对回答,评估了总体优势、完整性、准确性和整合潜力。结果:在排除一个需要视觉辅助的问题后,进行了198次配对评估(96次事实知识和102次临床推理)。ChatGPT-4的应答时间明显长于PICU专家(中位数:189对41;结论:ChatGPT-4有望成为PICU中事实医学教育的工具,在完整性方面表现出色。然而,在临床推理任务中需要监督,PICU专家的反应仍然是优越的。在PICU教育和其他类似未开发的医学领域独立使用ChatGPT-4之前,专家审查是必不可少的。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Comparing ChatGPT-4 and a Paediatric Intensive Care Specialist in Responding to Medical Education Questions: A Multicenter Evaluation

Objective

To compare the performance of the Generative Pre-trained Transformer model 4 (ChatGPT-4) with that of a paediatric intensive care unit (PICU) specialist in responding to open-ended medical education questions.

Methods

A comparative analysis was conducted using 100 educational questions sourced from a PICU trainee WhatsApp forum, covering factual knowledge and clinical reasoning. Ten PICU specialists from multiple tertiary paediatric centres independently evaluated 20 sets of paired responses from ChatGPT-4 and a PICU specialist (the original respondent to the forum questions), assessing overall superiority, completeness, accuracy, and integration potential.

Results

After excluding one question requiring a visual aid, 198 paired evaluations were made (96 factual knowledge and 102 clinical reasoning). ChatGPT-4's responses were significantly longer than those of the PICU specialist (median words: 189 vs. 41; p < 0.0001). ChatGPT-4 was preferred in 60% of factual knowledge comparisons (p < 0.001), while the PICU specialist's responses were preferred in 67% of clinical reasoning comparisons (p < 0.0001). ChatGPT-4 demonstrated superior completeness in factual knowledge (p = 0.02) but lower accuracy in clinical reasoning (p < 0.0001). Integration of both answers was favoured in 37% of cases (95% CI, 31%–44%).

Conclusions

ChatGPT-4 shows promise as a tool for factual medical education in the PICU, excelling in completeness. However, it requires oversight in clinical reasoning tasks, where the PICU specialist's responses remain superior. Expert review is essential before using ChatGPT-4 independently in PICU education and in other similarly underexplored medical fields.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
2.90
自引率
5.90%
发文量
487
审稿时长
3-6 weeks
期刊介绍: The Journal of Paediatrics and Child Health publishes original research articles of scientific excellence in paediatrics and child health. Research Articles, Case Reports and Letters to the Editor are published, together with invited Reviews, Annotations, Editorial Comments and manuscripts of educational interest.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信