Evaluating ChatGPT's efficacy and readability to common pediatric ophthalmology and strabismus-related questions.

IF 1.4 4区医学 Q3 OPHTHALMOLOGY

European Journal of Ophthalmology Pub Date : 2025-03-01 Epub Date: 2024-08-07 DOI:10.1177/11206721241272251

H Shafeeq Ahmed, Chinmayee J Thrishulamurthy

{"title":"Evaluating ChatGPT's efficacy and readability to common pediatric ophthalmology and strabismus-related questions.","authors":"H Shafeeq Ahmed, Chinmayee J Thrishulamurthy","doi":"10.1177/11206721241272251","DOIUrl":null,"url":null,"abstract":"Introduction: The rise in popularity of chatbots, particularly ChatGPT by OpenAI among the general public and its utility in the healthcare field is a topic of present controversy. The current study aimed at assessing the reliability and accuracy of ChatGPT's responses to inquiries posed by parents, specifically focusing on a range of pediatric ophthalmological and strabismus conditions.Methods: Patient queries were collected via a thematic analysis and posed to ChatGPT 3.5 version across 3 unique instances each. The questions were divided into 12 domains totalling 817 unique questions. All responses were scored on the response quality by two experienced pediatric ophthalmologists in a Likert-scale format. All questions were evaluated for readability using the Flesch-Kincaid Grade Level (FKGL) and character counts.Results: A total of 638 (78.09%) questions were scored to be perfectly correct, 156 (19.09%) were scored correct but incomplete and only 23 (2.81%) were scored to be partially incorrect. None of the responses were scored to be completely incorrect. Average FKGL score was 14.49 [95% CI 14.4004-14.5854] and the average character count was 1825.33 [95%CI 1791.95-1858.7] with p = 0.831 and 0.697 respectively. The minimum and maximum FKGL scores were 10.6 and 18.34 respectively. FKGL predicted character count, R²=.012, F(1,815) = 10.26, p = .001.Conclusion: ChatGPT provided accurate and reliable information for a majority of the questions. The readability of the questions was much above the typically required standards for adults, which is concerning. Despite these limitations, it is evident that this technology will play a significant role in the healthcare industry.","PeriodicalId":12000,"journal":{"name":"European Journal of Ophthalmology","volume":" ","pages":"466-473"},"PeriodicalIF":1.4000,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"European Journal of Ophthalmology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1177/11206721241272251","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/8/7 0:00:00","PubModel":"Epub","JCR":"Q3","JCRName":"OPHTHALMOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Introduction: The rise in popularity of chatbots, particularly ChatGPT by OpenAI among the general public and its utility in the healthcare field is a topic of present controversy. The current study aimed at assessing the reliability and accuracy of ChatGPT's responses to inquiries posed by parents, specifically focusing on a range of pediatric ophthalmological and strabismus conditions.

Methods: Patient queries were collected via a thematic analysis and posed to ChatGPT 3.5 version across 3 unique instances each. The questions were divided into 12 domains totalling 817 unique questions. All responses were scored on the response quality by two experienced pediatric ophthalmologists in a Likert-scale format. All questions were evaluated for readability using the Flesch-Kincaid Grade Level (FKGL) and character counts.

Results: A total of 638 (78.09%) questions were scored to be perfectly correct, 156 (19.09%) were scored correct but incomplete and only 23 (2.81%) were scored to be partially incorrect. None of the responses were scored to be completely incorrect. Average FKGL score was 14.49 [95% CI 14.4004-14.5854] and the average character count was 1825.33 [95%CI 1791.95-1858.7] with p = 0.831 and 0.697 respectively. The minimum and maximum FKGL scores were 10.6 and 18.34 respectively. FKGL predicted character count, R²=.012, F(1,815) = 10.26, p = .001.

Conclusion: ChatGPT provided accurate and reliable information for a majority of the questions. The readability of the questions was much above the typically required standards for adults, which is concerning. Despite these limitations, it is evident that this technology will play a significant role in the healthcare industry.

查看原文本刊更多论文

评估 ChatGPT 对常见儿科眼科和斜视相关问题的有效性和可读性。

引言聊天机器人（尤其是 OpenAI 的 ChatGPT）在公众中的普及程度及其在医疗保健领域的实用性是目前备受争议的话题。本研究旨在评估 ChatGPT 对家长提出的询问所做回复的可靠性和准确性，尤其关注一系列儿科眼科和斜视疾病：方法：通过主题分析收集患者的询问，并在 ChatGPT 3.5 版本中提出 3 个独特的问题。问题分为 12 个领域，共计 817 个独特问题。所有回答均由两名经验丰富的儿科眼科医生以李克特量表的形式对回答质量进行评分。所有问题的可读性均采用弗莱什-金凯德等级（FKGL）和字符数进行评估：共有 638 个问题（78.09%）被评为完全正确，156 个问题（19.09%）被评为正确但不完整，只有 23 个问题（2.81%）被评为部分错误。没有一个答案是完全错误的。平均 FKGL 得分为 14.49 [95%CI 14.4004-14.5854]，平均字符数为 1825.33 [95%CI 1791.95-1858.7]，p 分别为 0.831 和 0.697。FKGL 分数的最小值和最大值分别为 10.6 和 18.34。FKGL 预测字符数，R²=.012，F(1,815) = 10.26，p = .001.结论：ChatGPT 为大多数问题提供了准确可靠的信息。问题的可读性远高于通常要求的成人标准，这一点令人担忧。尽管存在这些局限性，但这项技术显然将在医疗保健行业发挥重要作用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

European Journal of Ophthalmology 医学-眼科学

CiteScore

3.60

自引率

0.00%

发文量

372

审稿时长

3-8 weeks

期刊介绍： The European Journal of Ophthalmology was founded in 1991 and is issued in print bi-monthly. It publishes only peer-reviewed original research reporting clinical observations and laboratory investigations with clinical relevance focusing on new diagnostic and surgical techniques, instrument and therapy updates, results of clinical trials and research findings.