Chedva Levin, Brurya Orkaby, Erika Kerner, Mor Saban
{"title":"Can large language models assist with pediatric dosing accuracy?","authors":"Chedva Levin, Brurya Orkaby, Erika Kerner, Mor Saban","doi":"10.1038/s41390-025-03980-8","DOIUrl":null,"url":null,"abstract":"<p><strong>Background and objective: </strong>Medication errors in pediatric care remain a significant healthcare challenge despite technological advancements, necessitating innovative approaches. This study aims to evaluate Large Language Models' (LLMs) potential in reducing pediatric medication dosage calculation errors compared to experienced nurses.</p><p><strong>Methods: </strong>This cross-sectional study (June-August 2024) involved 101 nurses from pediatric and neonatal departments and three LLMs (ChatGPT-4o, Claude-3.0, Llama 3 8B). Participants completed a nine-question survey on pediatric medication calculations. Primary outcomes were accuracy and response time. Secondary measures included seniority and group membership on accuracy.</p><p><strong>Results: </strong>Significant differences (P < 0.001) were observed between nurses and LLMs. Nurses averaged 93.14 ± 9.39 accuracy. Claude-3.0 and ChatGPT-4o achieved 100 accuracy, while Llama 3 8B was 66 accurate. LLMs were faster (15.7-75.12 seconds) than nurses (1621.2 ± 8379.3 s). The Generalized Linear Model analysis revealed task performance was significantly influenced by duration (Wald χ² = 27,881.261, p < 0.001) and interaction between relative seniority and group membership (Wald χ² = 3,938.250, p < 0.001), with participants achieving a mean total grade of 91.03 (SD = 13.87).</p><p><strong>Conclusions: </strong>Claude-3.0 and ChatGPT-4o demonstrated perfect accuracy and rapid calculation capabilities, showing promise in reducing pediatric medication dosage errors. Further research is needed to explore their integration into practice.</p><p><strong>Impact: </strong>Key Message Large Language Models (LLMs) like ChatGPT-4o and Claude-3.0 demonstrate perfect accuracy and significantly faster response times in pediatric medication dosage calculations, showing potential to reduce errors and save time. Addition to Existing Literature This study provides novel insights by quantitatively comparing LLM performance with experienced nurses, contributing to the understanding of AI's role in improving medication safety. Impact The findings emphasize the value of LLMs as supplemental tools in healthcare, particularly in high-stakes pediatric care, where they can reduce calculation errors and improve clinical efficiency.</p>","PeriodicalId":19829,"journal":{"name":"Pediatric Research","volume":" ","pages":""},"PeriodicalIF":3.1000,"publicationDate":"2025-03-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pediatric Research","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1038/s41390-025-03980-8","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"PEDIATRICS","Score":null,"Total":0}
引用次数: 0
Abstract
Background and objective: Medication errors in pediatric care remain a significant healthcare challenge despite technological advancements, necessitating innovative approaches. This study aims to evaluate Large Language Models' (LLMs) potential in reducing pediatric medication dosage calculation errors compared to experienced nurses.
Methods: This cross-sectional study (June-August 2024) involved 101 nurses from pediatric and neonatal departments and three LLMs (ChatGPT-4o, Claude-3.0, Llama 3 8B). Participants completed a nine-question survey on pediatric medication calculations. Primary outcomes were accuracy and response time. Secondary measures included seniority and group membership on accuracy.
Results: Significant differences (P < 0.001) were observed between nurses and LLMs. Nurses averaged 93.14 ± 9.39 accuracy. Claude-3.0 and ChatGPT-4o achieved 100 accuracy, while Llama 3 8B was 66 accurate. LLMs were faster (15.7-75.12 seconds) than nurses (1621.2 ± 8379.3 s). The Generalized Linear Model analysis revealed task performance was significantly influenced by duration (Wald χ² = 27,881.261, p < 0.001) and interaction between relative seniority and group membership (Wald χ² = 3,938.250, p < 0.001), with participants achieving a mean total grade of 91.03 (SD = 13.87).
Conclusions: Claude-3.0 and ChatGPT-4o demonstrated perfect accuracy and rapid calculation capabilities, showing promise in reducing pediatric medication dosage errors. Further research is needed to explore their integration into practice.
Impact: Key Message Large Language Models (LLMs) like ChatGPT-4o and Claude-3.0 demonstrate perfect accuracy and significantly faster response times in pediatric medication dosage calculations, showing potential to reduce errors and save time. Addition to Existing Literature This study provides novel insights by quantitatively comparing LLM performance with experienced nurses, contributing to the understanding of AI's role in improving medication safety. Impact The findings emphasize the value of LLMs as supplemental tools in healthcare, particularly in high-stakes pediatric care, where they can reduce calculation errors and improve clinical efficiency.
期刊介绍:
Pediatric Research publishes original papers, invited reviews, and commentaries on the etiologies of children''s diseases and
disorders of development, extending from molecular biology to epidemiology. Use of model organisms and in vitro techniques
relevant to developmental biology and medicine are acceptable, as are translational human studies