弥合新生儿护理的差距:评估人工智能聊天机器人对慢性新生儿肺部疾病和家庭氧疗管理的影响。

IF 2.7 3区 医学 Q1 PEDIATRICS
Weiqin Liu, Hong Wei, Lingling Xiang, Yin Liu, Chunyi Wang, Ziyu Hua
{"title":"弥合新生儿护理的差距:评估人工智能聊天机器人对慢性新生儿肺部疾病和家庭氧疗管理的影响。","authors":"Weiqin Liu, Hong Wei, Lingling Xiang, Yin Liu, Chunyi Wang, Ziyu Hua","doi":"10.1002/ppul.71020","DOIUrl":null,"url":null,"abstract":"<p><strong>Objective: </strong>To evaluate the accuracy and comprehensiveness of eight free, publicly available large language model (LLM) chatbots in addressing common questions related to chronic neonatal lung disease (CNLD) and home oxygen therapy (HOT).</p><p><strong>Study design: </strong>Twenty CNLD and HOT-related questions were curated across nine domains. Responses from ChatGPT-3.5, Google Bard, Bing Chat, Claude 3.5 Sonnet, ERNIE Bot 3.5, and GLM-4 were generated and evaluated by three experienced neonatologists using Likert scales for accuracy and comprehensiveness. Updated LLM models (ChatGPT-4o mini and Gemini 2.0 Flash Experimental) were incorporated to assess rapid technological advancement. Statistical analyses included ANOVA, Kruskal-Wallis tests, and intraclass correlation coefficients.</p><p><strong>Results: </strong>Bing Chat and Claude 3.5 Sonnet demonstrated superior performance, with the highest mean accuracy scores (5.78 ± 0.48 and 5.75 ± 0.54, respectively) and competence scores (2.65 ± 0.58 and 2.80 ± 0.41, respectively). In subsequent testing, Gemini 2.0 Flash Experimental and ChatGPT-4o mini achieved comparable high performance. Performance varied across domains, with all models excelling in \"equipment and safety protocols\" and \"caregiver support.\" ERNIE Bot 3.5 and GLM-4 showed self-correction capabilities when prompted.</p><p><strong>Conclusions: </strong>LLMs promise accurate CNLD/HOT information. However, performance variability and the risk of misinformation necessitate expert oversight and continued refinement before widespread clinical implementation.</p>","PeriodicalId":19932,"journal":{"name":"Pediatric Pulmonology","volume":"60 3","pages":"e71020"},"PeriodicalIF":2.7000,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Bridging the Gap in Neonatal Care: Evaluating AI Chatbots for Chronic Neonatal Lung Disease and Home Oxygen Therapy Management.\",\"authors\":\"Weiqin Liu, Hong Wei, Lingling Xiang, Yin Liu, Chunyi Wang, Ziyu Hua\",\"doi\":\"10.1002/ppul.71020\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Objective: </strong>To evaluate the accuracy and comprehensiveness of eight free, publicly available large language model (LLM) chatbots in addressing common questions related to chronic neonatal lung disease (CNLD) and home oxygen therapy (HOT).</p><p><strong>Study design: </strong>Twenty CNLD and HOT-related questions were curated across nine domains. Responses from ChatGPT-3.5, Google Bard, Bing Chat, Claude 3.5 Sonnet, ERNIE Bot 3.5, and GLM-4 were generated and evaluated by three experienced neonatologists using Likert scales for accuracy and comprehensiveness. Updated LLM models (ChatGPT-4o mini and Gemini 2.0 Flash Experimental) were incorporated to assess rapid technological advancement. Statistical analyses included ANOVA, Kruskal-Wallis tests, and intraclass correlation coefficients.</p><p><strong>Results: </strong>Bing Chat and Claude 3.5 Sonnet demonstrated superior performance, with the highest mean accuracy scores (5.78 ± 0.48 and 5.75 ± 0.54, respectively) and competence scores (2.65 ± 0.58 and 2.80 ± 0.41, respectively). In subsequent testing, Gemini 2.0 Flash Experimental and ChatGPT-4o mini achieved comparable high performance. Performance varied across domains, with all models excelling in \\\"equipment and safety protocols\\\" and \\\"caregiver support.\\\" ERNIE Bot 3.5 and GLM-4 showed self-correction capabilities when prompted.</p><p><strong>Conclusions: </strong>LLMs promise accurate CNLD/HOT information. However, performance variability and the risk of misinformation necessitate expert oversight and continued refinement before widespread clinical implementation.</p>\",\"PeriodicalId\":19932,\"journal\":{\"name\":\"Pediatric Pulmonology\",\"volume\":\"60 3\",\"pages\":\"e71020\"},\"PeriodicalIF\":2.7000,\"publicationDate\":\"2025-03-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Pediatric Pulmonology\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1002/ppul.71020\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"PEDIATRICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pediatric Pulmonology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1002/ppul.71020","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"PEDIATRICS","Score":null,"Total":0}
引用次数: 0

摘要

目的评估八个免费、公开的大型语言模型(LLM)聊天机器人在解决与慢性新生儿肺病(CNLD)和家庭氧疗(HOT)相关的常见问题时的准确性和全面性:研究设计:在九个领域中收集了 20 个与 CNLD 和 HOT 相关的问题。ChatGPT-3.5 、Google Bard、Bing Chat、Claude 3.5 Sonnet、ERNIE Bot 3.5 和 GLM-4 生成了回复,并由三位经验丰富的新生儿科医生使用李克特量表对回复的准确性和全面性进行了评估。最新的 LLM 模型(ChatGPT-4o mini 和 Gemini 2.0 Flash Experimental)被纳入其中,以评估技术的快速发展。统计分析包括方差分析、Kruskal-Wallis 检验和类内相关系数:Bing Chat 和 Claude 3.5 Sonnet 的性能优越,平均准确度得分(分别为 5.78 ± 0.48 和 5.75 ± 0.54)和能力得分(分别为 2.65 ± 0.58 和 2.80 ± 0.41)最高。在随后的测试中,Gemini 2.0 Flash Experimental 和 ChatGPT-4o mini 取得了相当高的性能。各领域的表现不尽相同,所有模型在 "设备和安全协议 "和 "护理人员支持 "方面都表现出色。ERNIE Bot 3.5 和 GLM-4 在提示时显示了自我纠正能力:结论:LLM 可提供准确的 CNLD/HOT 信息。结论:LLM 有望提供准确的 CNLD/HOT 信息,但其性能的可变性和错误信息的风险需要专家的监督和不断改进,然后才能在临床上广泛应用。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Bridging the Gap in Neonatal Care: Evaluating AI Chatbots for Chronic Neonatal Lung Disease and Home Oxygen Therapy Management.

Objective: To evaluate the accuracy and comprehensiveness of eight free, publicly available large language model (LLM) chatbots in addressing common questions related to chronic neonatal lung disease (CNLD) and home oxygen therapy (HOT).

Study design: Twenty CNLD and HOT-related questions were curated across nine domains. Responses from ChatGPT-3.5, Google Bard, Bing Chat, Claude 3.5 Sonnet, ERNIE Bot 3.5, and GLM-4 were generated and evaluated by three experienced neonatologists using Likert scales for accuracy and comprehensiveness. Updated LLM models (ChatGPT-4o mini and Gemini 2.0 Flash Experimental) were incorporated to assess rapid technological advancement. Statistical analyses included ANOVA, Kruskal-Wallis tests, and intraclass correlation coefficients.

Results: Bing Chat and Claude 3.5 Sonnet demonstrated superior performance, with the highest mean accuracy scores (5.78 ± 0.48 and 5.75 ± 0.54, respectively) and competence scores (2.65 ± 0.58 and 2.80 ± 0.41, respectively). In subsequent testing, Gemini 2.0 Flash Experimental and ChatGPT-4o mini achieved comparable high performance. Performance varied across domains, with all models excelling in "equipment and safety protocols" and "caregiver support." ERNIE Bot 3.5 and GLM-4 showed self-correction capabilities when prompted.

Conclusions: LLMs promise accurate CNLD/HOT information. However, performance variability and the risk of misinformation necessitate expert oversight and continued refinement before widespread clinical implementation.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Pediatric Pulmonology
Pediatric Pulmonology 医学-呼吸系统
CiteScore
6.00
自引率
12.90%
发文量
468
审稿时长
3-8 weeks
期刊介绍: Pediatric Pulmonology (PPUL) is the foremost global journal studying the respiratory system in disease and in health as it develops from intrauterine life though adolescence to adulthood. Combining explicit and informative analysis of clinical as well as basic scientific research, PPUL provides a look at the many facets of respiratory system disorders in infants and children, ranging from pathological anatomy, developmental issues, and pathophysiology to infectious disease, asthma, cystic fibrosis, and airborne toxins. Focused attention is given to the reporting of diagnostic and therapeutic methods for neonates, preschool children, and adolescents, the enduring effects of childhood respiratory diseases, and newly described infectious diseases. PPUL concentrates on subject matters of crucial interest to specialists preparing for the Pediatric Subspecialty Examinations in the United States and other countries. With its attentive coverage and extensive clinical data, this journal is a principle source for pediatricians in practice and in training and a must have for all pediatric pulmonologists.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信