Accuracy and Readability of ChatGPT Responses to Patient-Centric Strabismus Questions.

IF 0.9 4区医学 Q4 OPHTHALMOLOGY

Journal of Pediatric Ophthalmology & Strabismus Pub Date : 2025-05-01 Epub Date: 2025-02-19 DOI:10.3928/01913913-20250110-02

Ashlyn A Gary, James M Lai, Elyana V T Locatelli, Michelle M Falcone, Kara M Cavuoto

{"title":"Accuracy and Readability of ChatGPT Responses to Patient-Centric Strabismus Questions.","authors":"Ashlyn A Gary, James M Lai, Elyana V T Locatelli, Michelle M Falcone, Kara M Cavuoto","doi":"10.3928/01913913-20250110-02","DOIUrl":null,"url":null,"abstract":"Purpose: To assess the medical accuracy and readability of responses provided by ChatGPT (OpenAI), the most widely used artificial intelligence-powered chatbot, regarding questions about strabismus.Methods: Thirty-four questions were input into ChatGPT 3.5 (free version) and 4.0 (paid version) at three time intervals (day 0, 1 week, and 1 month) in two distinct geographic locations (California and Florida) in March 2024. Two pediatric ophthalmologists rated responses as \"acceptable,\" \"accurate but missing key information or minor inaccuracies,\" or \"inaccurate and potentially harmful.\" The online tool, Readable, measured the Flesch-Kincaid Grade Level and Flesch Reading Ease Score to assess readability.Results: Overall, 64% of responses by ChatGPT were \"acceptable;\" but the proportion of \"acceptable\" responses differed by version (47% for ChatGPT 3.5 vs 53% for 4.0, P < .05) and state (77% of California vs 51% of Florida, P < .001). Responses in Florida were more likely to be \"inaccurate and potentially harmful\" compared to those in California (6.9% vs. 1.5%, P < .001). Over 1 month, the overall percentage of \"acceptable\" responses increased (60% at day 0, 64% at 1 week, and 67% at 1 month, P > .05), whereas \"inaccurate and potentially harmful\" responses decreased (5% at day 0, 5% at 1 week, and 3% at 1 month, P > .05). On average, responses scored a Flesch-Kincaid Grade Level score of 15, equating to a higher than high school grade reading level.Conclusions: Although most of ChatGPT's responses to strabismus questions were clinically acceptable, there were variations in responses across time and geographic regions. The average reading level exceeded a high school level and demonstrated low readability. Although ChatGPT demonstrates potential as a supplementary resource for parents and patients with strabismus, improving the accuracy and readability of free versions of ChatGPT may increase its utility. [J Pediatr Ophthalmol Strabismus. 2025;62(3):220-227.].","PeriodicalId":50095,"journal":{"name":"Journal of Pediatric Ophthalmology & Strabismus","volume":" ","pages":"220-227"},"PeriodicalIF":0.9000,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Pediatric Ophthalmology & Strabismus","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.3928/01913913-20250110-02","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/2/19 0:00:00","PubModel":"Epub","JCR":"Q4","JCRName":"OPHTHALMOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Purpose: To assess the medical accuracy and readability of responses provided by ChatGPT (OpenAI), the most widely used artificial intelligence-powered chatbot, regarding questions about strabismus.

Methods: Thirty-four questions were input into ChatGPT 3.5 (free version) and 4.0 (paid version) at three time intervals (day 0, 1 week, and 1 month) in two distinct geographic locations (California and Florida) in March 2024. Two pediatric ophthalmologists rated responses as "acceptable," "accurate but missing key information or minor inaccuracies," or "inaccurate and potentially harmful." The online tool, Readable, measured the Flesch-Kincaid Grade Level and Flesch Reading Ease Score to assess readability.

Results: Overall, 64% of responses by ChatGPT were "acceptable;" but the proportion of "acceptable" responses differed by version (47% for ChatGPT 3.5 vs 53% for 4.0, P < .05) and state (77% of California vs 51% of Florida, P < .001). Responses in Florida were more likely to be "inaccurate and potentially harmful" compared to those in California (6.9% vs. 1.5%, P < .001). Over 1 month, the overall percentage of "acceptable" responses increased (60% at day 0, 64% at 1 week, and 67% at 1 month, P > .05), whereas "inaccurate and potentially harmful" responses decreased (5% at day 0, 5% at 1 week, and 3% at 1 month, P > .05). On average, responses scored a Flesch-Kincaid Grade Level score of 15, equating to a higher than high school grade reading level.

Conclusions: Although most of ChatGPT's responses to strabismus questions were clinically acceptable, there were variations in responses across time and geographic regions. The average reading level exceeded a high school level and demonstrated low readability. Although ChatGPT demonstrates potential as a supplementary resource for parents and patients with strabismus, improving the accuracy and readability of free versions of ChatGPT may increase its utility. [J Pediatr Ophthalmol Strabismus. 2025;62(3):220-227.].

查看原文本刊更多论文

ChatGPT回答以患者为中心的斜视问题的准确性和可读性。

目的：评估使用最广泛的人工智能聊天机器人ChatGPT （OpenAI）对斜视问题的回答的医学准确性和可读性。方法：于2024年3月在两个不同的地理位置（加利福尼亚州和佛罗里达州），按三个时间间隔（第0天、第1周和第1个月）在ChatGPT 3.5（免费版）和4.0（付费版）中输入34个问题。两名儿童眼科医生将回答评为“可以接受”、“准确但缺少关键信息或轻微不准确”或“不准确且有潜在危害”。可读的在线工具测量了Flesch- kincaid等级水平和Flesch阅读轻松评分来评估可读性。结果：总体而言，64%的ChatGPT回答是“可接受的”，但“可接受”回答的比例因版本而异（ChatGPT 3.5为47%，4.0为53%，P < 0.05）和州（加利福尼亚州为77%，佛罗里达州为51%，P < 0.001）。与加利福尼亚州相比，佛罗里达州的回答更有可能是“不准确和潜在有害的”（6.9%对1.5%,P < 0.001）。1个月后，“可接受”反应的总体百分比增加（第0天为60%，第1周为64%，第1个月为67%，P > .05），而“不准确和潜在有害”反应减少（第0天为5%，第1周为5%，第1个月为3%，P > .05）。平均而言，受访者的Flesch-Kincaid年级水平得分为15分，相当于高于高中学生的阅读水平。结论：尽管大多数ChatGPT对斜视问题的回答在临床上是可以接受的，但不同时间和地理区域的反应存在差异。平均阅读水平超过高中水平，可读性较低。虽然ChatGPT显示了作为家长和斜视患者的补充资源的潜力，但提高ChatGPT免费版本的准确性和可读性可能会增加其效用。[J].儿童眼斜视，2009；X(X):XXX-XXX。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Pediatric Ophthalmology & Strabismus 医学-小儿科

CiteScore

1.80

自引率

8.30%

发文量

115

审稿时长

>12 weeks

期刊介绍： The Journal of Pediatric Ophthalmology & Strabismus is a bimonthly peer-reviewed publication for pediatric ophthalmologists. The Journal has published original articles on the diagnosis, treatment, and prevention of eye disorders in the pediatric age group and the treatment of strabismus in all age groups for over 50 years.