{"title":"弥合新生儿护理的差距:评估人工智能聊天机器人对慢性新生儿肺部疾病和家庭氧疗管理的影响。","authors":"Weiqin Liu, Hong Wei, Lingling Xiang, Yin Liu, Chunyi Wang, Ziyu Hua","doi":"10.1002/ppul.71020","DOIUrl":null,"url":null,"abstract":"<p><strong>Objective: </strong>To evaluate the accuracy and comprehensiveness of eight free, publicly available large language model (LLM) chatbots in addressing common questions related to chronic neonatal lung disease (CNLD) and home oxygen therapy (HOT).</p><p><strong>Study design: </strong>Twenty CNLD and HOT-related questions were curated across nine domains. Responses from ChatGPT-3.5, Google Bard, Bing Chat, Claude 3.5 Sonnet, ERNIE Bot 3.5, and GLM-4 were generated and evaluated by three experienced neonatologists using Likert scales for accuracy and comprehensiveness. Updated LLM models (ChatGPT-4o mini and Gemini 2.0 Flash Experimental) were incorporated to assess rapid technological advancement. Statistical analyses included ANOVA, Kruskal-Wallis tests, and intraclass correlation coefficients.</p><p><strong>Results: </strong>Bing Chat and Claude 3.5 Sonnet demonstrated superior performance, with the highest mean accuracy scores (5.78 ± 0.48 and 5.75 ± 0.54, respectively) and competence scores (2.65 ± 0.58 and 2.80 ± 0.41, respectively). In subsequent testing, Gemini 2.0 Flash Experimental and ChatGPT-4o mini achieved comparable high performance. Performance varied across domains, with all models excelling in \"equipment and safety protocols\" and \"caregiver support.\" ERNIE Bot 3.5 and GLM-4 showed self-correction capabilities when prompted.</p><p><strong>Conclusions: </strong>LLMs promise accurate CNLD/HOT information. However, performance variability and the risk of misinformation necessitate expert oversight and continued refinement before widespread clinical implementation.</p>","PeriodicalId":19932,"journal":{"name":"Pediatric Pulmonology","volume":"60 3","pages":"e71020"},"PeriodicalIF":2.7000,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Bridging the Gap in Neonatal Care: Evaluating AI Chatbots for Chronic Neonatal Lung Disease and Home Oxygen Therapy Management.\",\"authors\":\"Weiqin Liu, Hong Wei, Lingling Xiang, Yin Liu, Chunyi Wang, Ziyu Hua\",\"doi\":\"10.1002/ppul.71020\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Objective: </strong>To evaluate the accuracy and comprehensiveness of eight free, publicly available large language model (LLM) chatbots in addressing common questions related to chronic neonatal lung disease (CNLD) and home oxygen therapy (HOT).</p><p><strong>Study design: </strong>Twenty CNLD and HOT-related questions were curated across nine domains. Responses from ChatGPT-3.5, Google Bard, Bing Chat, Claude 3.5 Sonnet, ERNIE Bot 3.5, and GLM-4 were generated and evaluated by three experienced neonatologists using Likert scales for accuracy and comprehensiveness. Updated LLM models (ChatGPT-4o mini and Gemini 2.0 Flash Experimental) were incorporated to assess rapid technological advancement. Statistical analyses included ANOVA, Kruskal-Wallis tests, and intraclass correlation coefficients.</p><p><strong>Results: </strong>Bing Chat and Claude 3.5 Sonnet demonstrated superior performance, with the highest mean accuracy scores (5.78 ± 0.48 and 5.75 ± 0.54, respectively) and competence scores (2.65 ± 0.58 and 2.80 ± 0.41, respectively). In subsequent testing, Gemini 2.0 Flash Experimental and ChatGPT-4o mini achieved comparable high performance. Performance varied across domains, with all models excelling in \\\"equipment and safety protocols\\\" and \\\"caregiver support.\\\" ERNIE Bot 3.5 and GLM-4 showed self-correction capabilities when prompted.</p><p><strong>Conclusions: </strong>LLMs promise accurate CNLD/HOT information. However, performance variability and the risk of misinformation necessitate expert oversight and continued refinement before widespread clinical implementation.</p>\",\"PeriodicalId\":19932,\"journal\":{\"name\":\"Pediatric Pulmonology\",\"volume\":\"60 3\",\"pages\":\"e71020\"},\"PeriodicalIF\":2.7000,\"publicationDate\":\"2025-03-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Pediatric Pulmonology\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1002/ppul.71020\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"PEDIATRICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pediatric Pulmonology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1002/ppul.71020","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"PEDIATRICS","Score":null,"Total":0}
Bridging the Gap in Neonatal Care: Evaluating AI Chatbots for Chronic Neonatal Lung Disease and Home Oxygen Therapy Management.
Objective: To evaluate the accuracy and comprehensiveness of eight free, publicly available large language model (LLM) chatbots in addressing common questions related to chronic neonatal lung disease (CNLD) and home oxygen therapy (HOT).
Study design: Twenty CNLD and HOT-related questions were curated across nine domains. Responses from ChatGPT-3.5, Google Bard, Bing Chat, Claude 3.5 Sonnet, ERNIE Bot 3.5, and GLM-4 were generated and evaluated by three experienced neonatologists using Likert scales for accuracy and comprehensiveness. Updated LLM models (ChatGPT-4o mini and Gemini 2.0 Flash Experimental) were incorporated to assess rapid technological advancement. Statistical analyses included ANOVA, Kruskal-Wallis tests, and intraclass correlation coefficients.
Results: Bing Chat and Claude 3.5 Sonnet demonstrated superior performance, with the highest mean accuracy scores (5.78 ± 0.48 and 5.75 ± 0.54, respectively) and competence scores (2.65 ± 0.58 and 2.80 ± 0.41, respectively). In subsequent testing, Gemini 2.0 Flash Experimental and ChatGPT-4o mini achieved comparable high performance. Performance varied across domains, with all models excelling in "equipment and safety protocols" and "caregiver support." ERNIE Bot 3.5 and GLM-4 showed self-correction capabilities when prompted.
Conclusions: LLMs promise accurate CNLD/HOT information. However, performance variability and the risk of misinformation necessitate expert oversight and continued refinement before widespread clinical implementation.
期刊介绍:
Pediatric Pulmonology (PPUL) is the foremost global journal studying the respiratory system in disease and in health as it develops from intrauterine life though adolescence to adulthood. Combining explicit and informative analysis of clinical as well as basic scientific research, PPUL provides a look at the many facets of respiratory system disorders in infants and children, ranging from pathological anatomy, developmental issues, and pathophysiology to infectious disease, asthma, cystic fibrosis, and airborne toxins. Focused attention is given to the reporting of diagnostic and therapeutic methods for neonates, preschool children, and adolescents, the enduring effects of childhood respiratory diseases, and newly described infectious diseases.
PPUL concentrates on subject matters of crucial interest to specialists preparing for the Pediatric Subspecialty Examinations in the United States and other countries. With its attentive coverage and extensive clinical data, this journal is a principle source for pediatricians in practice and in training and a must have for all pediatric pulmonologists.