Can large language models assist with pediatric dosing accuracy?

IF 3.1 3区医学 Q1 PEDIATRICS

Pediatric Research Pub Date : 2025-03-08 DOI:10.1038/s41390-025-03980-8

Chedva Levin, Brurya Orkaby, Erika Kerner, Mor Saban

{"title":"Can large language models assist with pediatric dosing accuracy?","authors":"Chedva Levin, Brurya Orkaby, Erika Kerner, Mor Saban","doi":"10.1038/s41390-025-03980-8","DOIUrl":null,"url":null,"abstract":"Background and objective: Medication errors in pediatric care remain a significant healthcare challenge despite technological advancements, necessitating innovative approaches. This study aims to evaluate Large Language Models' (LLMs) potential in reducing pediatric medication dosage calculation errors compared to experienced nurses.Methods: This cross-sectional study (June-August 2024) involved 101 nurses from pediatric and neonatal departments and three LLMs (ChatGPT-4o, Claude-3.0, Llama 3 8B). Participants completed a nine-question survey on pediatric medication calculations. Primary outcomes were accuracy and response time. Secondary measures included seniority and group membership on accuracy.Results: Significant differences (P < 0.001) were observed between nurses and LLMs. Nurses averaged 93.14 ± 9.39 accuracy. Claude-3.0 and ChatGPT-4o achieved 100 accuracy, while Llama 3 8B was 66 accurate. LLMs were faster (15.7-75.12 seconds) than nurses (1621.2 ± 8379.3 s). The Generalized Linear Model analysis revealed task performance was significantly influenced by duration (Wald χ² = 27,881.261, p < 0.001) and interaction between relative seniority and group membership (Wald χ² = 3,938.250, p < 0.001), with participants achieving a mean total grade of 91.03 (SD = 13.87).Conclusions: Claude-3.0 and ChatGPT-4o demonstrated perfect accuracy and rapid calculation capabilities, showing promise in reducing pediatric medication dosage errors. Further research is needed to explore their integration into practice.Impact: Key Message Large Language Models (LLMs) like ChatGPT-4o and Claude-3.0 demonstrate perfect accuracy and significantly faster response times in pediatric medication dosage calculations, showing potential to reduce errors and save time. Addition to Existing Literature This study provides novel insights by quantitatively comparing LLM performance with experienced nurses, contributing to the understanding of AI's role in improving medication safety. Impact The findings emphasize the value of LLMs as supplemental tools in healthcare, particularly in high-stakes pediatric care, where they can reduce calculation errors and improve clinical efficiency.","PeriodicalId":19829,"journal":{"name":"Pediatric Research","volume":" ","pages":""},"PeriodicalIF":3.1000,"publicationDate":"2025-03-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pediatric Research","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1038/s41390-025-03980-8","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"PEDIATRICS","Score":null,"Total":0}

引用次数: 0

Abstract

Background and objective: Medication errors in pediatric care remain a significant healthcare challenge despite technological advancements, necessitating innovative approaches. This study aims to evaluate Large Language Models' (LLMs) potential in reducing pediatric medication dosage calculation errors compared to experienced nurses.

Methods: This cross-sectional study (June-August 2024) involved 101 nurses from pediatric and neonatal departments and three LLMs (ChatGPT-4o, Claude-3.0, Llama 3 8B). Participants completed a nine-question survey on pediatric medication calculations. Primary outcomes were accuracy and response time. Secondary measures included seniority and group membership on accuracy.

Results: Significant differences (P < 0.001) were observed between nurses and LLMs. Nurses averaged 93.14 ± 9.39 accuracy. Claude-3.0 and ChatGPT-4o achieved 100 accuracy, while Llama 3 8B was 66 accurate. LLMs were faster (15.7-75.12 seconds) than nurses (1621.2 ± 8379.3 s). The Generalized Linear Model analysis revealed task performance was significantly influenced by duration (Wald χ² = 27,881.261, p < 0.001) and interaction between relative seniority and group membership (Wald χ² = 3,938.250, p < 0.001), with participants achieving a mean total grade of 91.03 (SD = 13.87).

Conclusions: Claude-3.0 and ChatGPT-4o demonstrated perfect accuracy and rapid calculation capabilities, showing promise in reducing pediatric medication dosage errors. Further research is needed to explore their integration into practice.

Impact: Key Message Large Language Models (LLMs) like ChatGPT-4o and Claude-3.0 demonstrate perfect accuracy and significantly faster response times in pediatric medication dosage calculations, showing potential to reduce errors and save time. Addition to Existing Literature This study provides novel insights by quantitatively comparing LLM performance with experienced nurses, contributing to the understanding of AI's role in improving medication safety. Impact The findings emphasize the value of LLMs as supplemental tools in healthcare, particularly in high-stakes pediatric care, where they can reduce calculation errors and improve clinical efficiency.

查看原文本刊更多论文

求助全文

约1分钟内获得全文求助全文

来源期刊

Pediatric Research 医学-小儿科

CiteScore

6.80

自引率

5.60%

发文量

473

审稿时长

3-8 weeks

期刊介绍： Pediatric Research publishes original papers, invited reviews, and commentaries on the etiologies of children''s diseases and disorders of development, extending from molecular biology to epidemiology. Use of model organisms and in vitro techniques relevant to developmental biology and medicine are acceptable, as are translational human studies