Parental education in pediatric dysphagia: A comparative analysis of three large language models.

IF 2.4 3区医学 Q3 GASTROENTEROLOGY & HEPATOLOGY

Journal of Pediatric Gastroenterology and Nutrition Pub Date : 2025-05-08 DOI:10.1002/jpn3.70069

Bülent Alyanak, Burak Tayyip Dede, Fatih Bağcıer, Mazlum Serdar Akaltun

{"title":"Parental education in pediatric dysphagia: A comparative analysis of three large language models.","authors":"Bülent Alyanak, Burak Tayyip Dede, Fatih Bağcıer, Mazlum Serdar Akaltun","doi":"10.1002/jpn3.70069","DOIUrl":null,"url":null,"abstract":"Objectives: This study evaluates the effectiveness of three widely used large language models (LLMs)-ChatGPT-4, Copilot, and Gemini-in providing accurate, reliable, and understandable answers to frequently asked questions about pediatric dysphagia.Methods: Twenty-five questions, selected based on Google Trends data, were presented to ChatGPT-4, Copilot, and Gemini, and the responses were evaluated using a 5-point Likert scale for accuracy, the Ensuring Quality Information for Patients (EQIP) and DISCERN scales for information quality and reliability, and the Flesch-Kincaid Grade Level (FKGL) and Flesch Reading Ease (FRE) scores for readability. The performance of ChatGPT-4, Copilot, and Gemini was assessed by presenting the same set of questions at three different time points: August, September, and October 2024. Statistical analyses included analysis of variance, Kruskal-Wallis tests, and post hoc comparisons, with p values below 0.05 considered significant.Results: ChatGPT-4 achieved the highest mean accuracy score (4.1 ± 0.7) compared to Copilot (3.1 ± 0.7) and Gemini (3.8 ± 0.8), with significant differences observed in quality ratings (p < 0.001 and p < 0.05, respectively). EQIP and DISCERN scores further confirmed the superior performance of ChatGPT-4. In terms of readability, Gemini achieved the highest scores (FRE = 48.7 ± 9.9 and FKGL = 10.1 ± 1.6).Conclusions: While ChatGPT-4 generally provided more accurate and reliable information, Gemini produced more readable content. However, variability in overall information quality indicates that, although LLMs hold potential as tools for pediatric dysphagia education, further improvements are necessary to ensure consistent delivery of reliable and accessible information.","PeriodicalId":16694,"journal":{"name":"Journal of Pediatric Gastroenterology and Nutrition","volume":" ","pages":""},"PeriodicalIF":2.4000,"publicationDate":"2025-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Pediatric Gastroenterology and Nutrition","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1002/jpn3.70069","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"GASTROENTEROLOGY & HEPATOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Objectives: This study evaluates the effectiveness of three widely used large language models (LLMs)-ChatGPT-4, Copilot, and Gemini-in providing accurate, reliable, and understandable answers to frequently asked questions about pediatric dysphagia.

Methods: Twenty-five questions, selected based on Google Trends data, were presented to ChatGPT-4, Copilot, and Gemini, and the responses were evaluated using a 5-point Likert scale for accuracy, the Ensuring Quality Information for Patients (EQIP) and DISCERN scales for information quality and reliability, and the Flesch-Kincaid Grade Level (FKGL) and Flesch Reading Ease (FRE) scores for readability. The performance of ChatGPT-4, Copilot, and Gemini was assessed by presenting the same set of questions at three different time points: August, September, and October 2024. Statistical analyses included analysis of variance, Kruskal-Wallis tests, and post hoc comparisons, with p values below 0.05 considered significant.

Results: ChatGPT-4 achieved the highest mean accuracy score (4.1 ± 0.7) compared to Copilot (3.1 ± 0.7) and Gemini (3.8 ± 0.8), with significant differences observed in quality ratings (p < 0.001 and p < 0.05, respectively). EQIP and DISCERN scores further confirmed the superior performance of ChatGPT-4. In terms of readability, Gemini achieved the highest scores (FRE = 48.7 ± 9.9 and FKGL = 10.1 ± 1.6).

Conclusions: While ChatGPT-4 generally provided more accurate and reliable information, Gemini produced more readable content. However, variability in overall information quality indicates that, although LLMs hold potential as tools for pediatric dysphagia education, further improvements are necessary to ensure consistent delivery of reliable and accessible information.

查看原文本刊更多论文

儿童吞咽困难的父母教育：三种大型语言模型的比较分析。

目的：本研究评估了三种广泛使用的大语言模型（llm）——chatgpt -4、Copilot和gemini——在为儿童吞咽困难的常见问题提供准确、可靠和可理解的答案方面的有效性。方法：根据谷歌Trends数据选择25个问题，提交给ChatGPT-4、Copilot和Gemini，并使用5点Likert量表进行准确性评估，使用EQIP和DISCERN量表进行信息质量和可靠性评估，使用Flesch- kincaid Grade Level （FKGL）和Flesch Reading Ease （FRE）评分进行可读性评估。ChatGPT-4、Copilot和Gemini的性能通过在2024年8月、9月和10月三个不同的时间点提出相同的一组问题来评估。统计分析包括方差分析、Kruskal-Wallis检验和事后比较，p值低于0.05认为显著。结果：ChatGPT-4与Copilot（3.1±0.7）和Gemini（3.8±0.8）相比，获得了最高的平均准确度评分（4.1±0.7），在质量评分上存在显著差异(p)。结论：ChatGPT-4通常提供更准确可靠的信息，而Gemini提供了更可读的内容。然而，总体信息质量的可变性表明，尽管法学硕士具有作为儿科吞咽困难教育工具的潜力，但需要进一步改进以确保始终如一地提供可靠和可访问的信息。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Pediatric Gastroenterology and Nutrition 医学-胃肠肝病学

CiteScore

5.30

自引率

13.80%

发文量

467

审稿时长

3-6 weeks

期刊介绍： The Journal of Pediatric Gastroenterology and Nutrition (JPGN) provides a forum for original papers and reviews dealing with pediatric gastroenterology and nutrition, including normal and abnormal functions of the alimentary tract and its associated organs, including the salivary glands, pancreas, gallbladder, and liver. Particular emphasis is on development and its relation to infant and childhood nutrition.