{"title":"比较ChatGPT-4和儿科重症监护专家在回答医学教育问题:一项多中心评估。","authors":"Shai Yitzhaki, Nadav Peled, Eytan Kaplan, Gili Kadmon, Elhanan Nahum, Yulia Gendler, Avichai Weissbach","doi":"10.1111/jpc.70080","DOIUrl":null,"url":null,"abstract":"<div>\n \n \n <section>\n \n <h3> Objective</h3>\n \n <p>To compare the performance of the Generative Pre-trained Transformer model 4 (ChatGPT-4) with that of a paediatric intensive care unit (PICU) specialist in responding to open-ended medical education questions.</p>\n </section>\n \n <section>\n \n <h3> Methods</h3>\n \n <p>A comparative analysis was conducted using 100 educational questions sourced from a PICU trainee WhatsApp forum, covering factual knowledge and clinical reasoning. Ten PICU specialists from multiple tertiary paediatric centres independently evaluated 20 sets of paired responses from ChatGPT-4 and a PICU specialist (the original respondent to the forum questions), assessing overall superiority, completeness, accuracy, and integration potential.</p>\n </section>\n \n <section>\n \n <h3> Results</h3>\n \n <p>After excluding one question requiring a visual aid, 198 paired evaluations were made (96 factual knowledge and 102 clinical reasoning). ChatGPT-4's responses were significantly longer than those of the PICU specialist (median words: 189 vs. 41; <i>p</i> < 0.0001). ChatGPT-4 was preferred in 60% of factual knowledge comparisons (<i>p</i> < 0.001), while the PICU specialist's responses were preferred in 67% of clinical reasoning comparisons (<i>p</i> < 0.0001). ChatGPT-4 demonstrated superior completeness in factual knowledge (<i>p</i> = 0.02) but lower accuracy in clinical reasoning (<i>p</i> < 0.0001). Integration of both answers was favoured in 37% of cases (95% CI, 31%–44%).</p>\n </section>\n \n <section>\n \n <h3> Conclusions</h3>\n \n <p>ChatGPT-4 shows promise as a tool for factual medical education in the PICU, excelling in completeness. However, it requires oversight in clinical reasoning tasks, where the PICU specialist's responses remain superior. Expert review is essential before using ChatGPT-4 independently in PICU education and in other similarly underexplored medical fields.</p>\n </section>\n </div>","PeriodicalId":16648,"journal":{"name":"Journal of paediatrics and child health","volume":"61 7","pages":"1084-1089"},"PeriodicalIF":1.6000,"publicationDate":"2025-05-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Comparing ChatGPT-4 and a Paediatric Intensive Care Specialist in Responding to Medical Education Questions: A Multicenter Evaluation\",\"authors\":\"Shai Yitzhaki, Nadav Peled, Eytan Kaplan, Gili Kadmon, Elhanan Nahum, Yulia Gendler, Avichai Weissbach\",\"doi\":\"10.1111/jpc.70080\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div>\\n \\n \\n <section>\\n \\n <h3> Objective</h3>\\n \\n <p>To compare the performance of the Generative Pre-trained Transformer model 4 (ChatGPT-4) with that of a paediatric intensive care unit (PICU) specialist in responding to open-ended medical education questions.</p>\\n </section>\\n \\n <section>\\n \\n <h3> Methods</h3>\\n \\n <p>A comparative analysis was conducted using 100 educational questions sourced from a PICU trainee WhatsApp forum, covering factual knowledge and clinical reasoning. Ten PICU specialists from multiple tertiary paediatric centres independently evaluated 20 sets of paired responses from ChatGPT-4 and a PICU specialist (the original respondent to the forum questions), assessing overall superiority, completeness, accuracy, and integration potential.</p>\\n </section>\\n \\n <section>\\n \\n <h3> Results</h3>\\n \\n <p>After excluding one question requiring a visual aid, 198 paired evaluations were made (96 factual knowledge and 102 clinical reasoning). ChatGPT-4's responses were significantly longer than those of the PICU specialist (median words: 189 vs. 41; <i>p</i> < 0.0001). ChatGPT-4 was preferred in 60% of factual knowledge comparisons (<i>p</i> < 0.001), while the PICU specialist's responses were preferred in 67% of clinical reasoning comparisons (<i>p</i> < 0.0001). ChatGPT-4 demonstrated superior completeness in factual knowledge (<i>p</i> = 0.02) but lower accuracy in clinical reasoning (<i>p</i> < 0.0001). Integration of both answers was favoured in 37% of cases (95% CI, 31%–44%).</p>\\n </section>\\n \\n <section>\\n \\n <h3> Conclusions</h3>\\n \\n <p>ChatGPT-4 shows promise as a tool for factual medical education in the PICU, excelling in completeness. However, it requires oversight in clinical reasoning tasks, where the PICU specialist's responses remain superior. Expert review is essential before using ChatGPT-4 independently in PICU education and in other similarly underexplored medical fields.</p>\\n </section>\\n </div>\",\"PeriodicalId\":16648,\"journal\":{\"name\":\"Journal of paediatrics and child health\",\"volume\":\"61 7\",\"pages\":\"1084-1089\"},\"PeriodicalIF\":1.6000,\"publicationDate\":\"2025-05-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of paediatrics and child health\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1111/jpc.70080\",\"RegionNum\":4,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"PEDIATRICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of paediatrics and child health","FirstCategoryId":"3","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1111/jpc.70080","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"PEDIATRICS","Score":null,"Total":0}
Comparing ChatGPT-4 and a Paediatric Intensive Care Specialist in Responding to Medical Education Questions: A Multicenter Evaluation
Objective
To compare the performance of the Generative Pre-trained Transformer model 4 (ChatGPT-4) with that of a paediatric intensive care unit (PICU) specialist in responding to open-ended medical education questions.
Methods
A comparative analysis was conducted using 100 educational questions sourced from a PICU trainee WhatsApp forum, covering factual knowledge and clinical reasoning. Ten PICU specialists from multiple tertiary paediatric centres independently evaluated 20 sets of paired responses from ChatGPT-4 and a PICU specialist (the original respondent to the forum questions), assessing overall superiority, completeness, accuracy, and integration potential.
Results
After excluding one question requiring a visual aid, 198 paired evaluations were made (96 factual knowledge and 102 clinical reasoning). ChatGPT-4's responses were significantly longer than those of the PICU specialist (median words: 189 vs. 41; p < 0.0001). ChatGPT-4 was preferred in 60% of factual knowledge comparisons (p < 0.001), while the PICU specialist's responses were preferred in 67% of clinical reasoning comparisons (p < 0.0001). ChatGPT-4 demonstrated superior completeness in factual knowledge (p = 0.02) but lower accuracy in clinical reasoning (p < 0.0001). Integration of both answers was favoured in 37% of cases (95% CI, 31%–44%).
Conclusions
ChatGPT-4 shows promise as a tool for factual medical education in the PICU, excelling in completeness. However, it requires oversight in clinical reasoning tasks, where the PICU specialist's responses remain superior. Expert review is essential before using ChatGPT-4 independently in PICU education and in other similarly underexplored medical fields.
期刊介绍:
The Journal of Paediatrics and Child Health publishes original research articles of scientific excellence in paediatrics and child health. Research Articles, Case Reports and Letters to the Editor are published, together with invited Reviews, Annotations, Editorial Comments and manuscripts of educational interest.