Bülent Alyanak, Burak Tayyip Dede, Fatih Bağcıer, Mazlum Serdar Akaltun
{"title":"Parental education in pediatric dysphagia: A comparative analysis of three large language models.","authors":"Bülent Alyanak, Burak Tayyip Dede, Fatih Bağcıer, Mazlum Serdar Akaltun","doi":"10.1002/jpn3.70069","DOIUrl":null,"url":null,"abstract":"<p><strong>Objectives: </strong>This study evaluates the effectiveness of three widely used large language models (LLMs)-ChatGPT-4, Copilot, and Gemini-in providing accurate, reliable, and understandable answers to frequently asked questions about pediatric dysphagia.</p><p><strong>Methods: </strong>Twenty-five questions, selected based on Google Trends data, were presented to ChatGPT-4, Copilot, and Gemini, and the responses were evaluated using a 5-point Likert scale for accuracy, the Ensuring Quality Information for Patients (EQIP) and DISCERN scales for information quality and reliability, and the Flesch-Kincaid Grade Level (FKGL) and Flesch Reading Ease (FRE) scores for readability. The performance of ChatGPT-4, Copilot, and Gemini was assessed by presenting the same set of questions at three different time points: August, September, and October 2024. Statistical analyses included analysis of variance, Kruskal-Wallis tests, and post hoc comparisons, with p values below 0.05 considered significant.</p><p><strong>Results: </strong>ChatGPT-4 achieved the highest mean accuracy score (4.1 ± 0.7) compared to Copilot (3.1 ± 0.7) and Gemini (3.8 ± 0.8), with significant differences observed in quality ratings (p < 0.001 and p < 0.05, respectively). EQIP and DISCERN scores further confirmed the superior performance of ChatGPT-4. In terms of readability, Gemini achieved the highest scores (FRE = 48.7 ± 9.9 and FKGL = 10.1 ± 1.6).</p><p><strong>Conclusions: </strong>While ChatGPT-4 generally provided more accurate and reliable information, Gemini produced more readable content. However, variability in overall information quality indicates that, although LLMs hold potential as tools for pediatric dysphagia education, further improvements are necessary to ensure consistent delivery of reliable and accessible information.</p>","PeriodicalId":16694,"journal":{"name":"Journal of Pediatric Gastroenterology and Nutrition","volume":" ","pages":""},"PeriodicalIF":2.4000,"publicationDate":"2025-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Pediatric Gastroenterology and Nutrition","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1002/jpn3.70069","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"GASTROENTEROLOGY & HEPATOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Objectives: This study evaluates the effectiveness of three widely used large language models (LLMs)-ChatGPT-4, Copilot, and Gemini-in providing accurate, reliable, and understandable answers to frequently asked questions about pediatric dysphagia.
Methods: Twenty-five questions, selected based on Google Trends data, were presented to ChatGPT-4, Copilot, and Gemini, and the responses were evaluated using a 5-point Likert scale for accuracy, the Ensuring Quality Information for Patients (EQIP) and DISCERN scales for information quality and reliability, and the Flesch-Kincaid Grade Level (FKGL) and Flesch Reading Ease (FRE) scores for readability. The performance of ChatGPT-4, Copilot, and Gemini was assessed by presenting the same set of questions at three different time points: August, September, and October 2024. Statistical analyses included analysis of variance, Kruskal-Wallis tests, and post hoc comparisons, with p values below 0.05 considered significant.
Results: ChatGPT-4 achieved the highest mean accuracy score (4.1 ± 0.7) compared to Copilot (3.1 ± 0.7) and Gemini (3.8 ± 0.8), with significant differences observed in quality ratings (p < 0.001 and p < 0.05, respectively). EQIP and DISCERN scores further confirmed the superior performance of ChatGPT-4. In terms of readability, Gemini achieved the highest scores (FRE = 48.7 ± 9.9 and FKGL = 10.1 ± 1.6).
Conclusions: While ChatGPT-4 generally provided more accurate and reliable information, Gemini produced more readable content. However, variability in overall information quality indicates that, although LLMs hold potential as tools for pediatric dysphagia education, further improvements are necessary to ensure consistent delivery of reliable and accessible information.
期刊介绍:
The Journal of Pediatric Gastroenterology and Nutrition (JPGN) provides a forum for original papers and reviews dealing with pediatric gastroenterology and nutrition, including normal and abnormal functions of the alimentary tract and its associated organs, including the salivary glands, pancreas, gallbladder, and liver. Particular emphasis is on development and its relation to infant and childhood nutrition.