Eyes on the Text: Assessing Readability of Artificial Intelligence and Ophthalmologist Responses to Patient Surgery Queries.

IF 1.9 4区医学 Q2 OPHTHALMOLOGY

Ophthalmologica Pub Date : 2025-01-01 Epub Date: 2025-03-10 DOI:10.1159/000544917

Sai S Kurapati, Derek J Barnett, Antonio Yaghy, Cameron J Sabet, David N Younessi, Dang Nguyen, John C Lin, Ingrid U Scott

{"title":"Eyes on the Text: Assessing Readability of Artificial Intelligence and Ophthalmologist Responses to Patient Surgery Queries.","authors":"Sai S Kurapati, Derek J Barnett, Antonio Yaghy, Cameron J Sabet, David N Younessi, Dang Nguyen, John C Lin, Ingrid U Scott","doi":"10.1159/000544917","DOIUrl":null,"url":null,"abstract":"Introduction: Generative artificial intelligence (AI) technologies like GPT-4 can instantaneously provide health information to patients; however, the readability of these outputs compared to ophthalmologist-written responses is unknown. This study aimed to evaluate the readability of GPT-4-generated and ophthalmologist-written responses to patient queries about ophthalmic surgery.Methods: This retrospective cross-sectional study used 200 randomly selected patient questions about ophthalmic surgery extracted from the American Academy of Ophthalmology's EyeSmart platform. The questions were inputted into GPT-4, and the generated responses were recorded. Ophthalmologist-written replies to the same questions were compiled for comparison. Readability of GPT-4 and ophthalmologist responses was assessed using six validated metrics: Flesch Kincaid Reading Ease (FK-RE), Flesch Kincaid Grade Level (FK-GL), Gunning Fog Score (GFS), SMOG Index (SI), Coleman Liau Index (CLI), and Automated Readability Index (ARI). Descriptive statistics, one-way ANOVA, Shapiro-Wilk, and Levene's tests (α = 0.05) were used to compare readability between the two groups.Results: GPT-4 used a higher percentage of complex words (24.42%) compared to ophthalmologists (17.76%), although mean (standard deviation) word count per sentence was similar (18.43 [2.95] and 18.01 [6.09]). Across all metrics (FK-RE; FK-GL; GFS; SI; CLI; and ARI), GPT-4 responses were at a higher grade level (34.39 [8.51]; 13.19 [2.63]; 16.37 [2.04]; 12.18 [1.43]; 15.72 [1.40]; 12.99 [1.86]) than ophthalmologists' responses (50.61 [15.53]; 10.71 [2.99]; 14.13 [3.55]; 10.07 [2.46]; 12.64 [2.93]; 10.40 [3.61]), with both sources necessitating a 12th-grade education for comprehension. ANOVA tests showed significance (p < 0.05) for all comparisons except word count (p = 0.438).Conclusion: The National Institutes of Health advises health information to be written at a 6th- to 7th-grade level. Both GPT-4- and ophthalmologist-written answers exceeded this recommendation, with GPT-4 showing a greater gap. Information accessibility is vital when designing patient resources, particularly with the rise of AI as an educational tool.","PeriodicalId":19595,"journal":{"name":"Ophthalmologica","volume":" ","pages":"149-159"},"PeriodicalIF":1.9000,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Ophthalmologica","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1159/000544917","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/3/10 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"OPHTHALMOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Introduction: Generative artificial intelligence (AI) technologies like GPT-4 can instantaneously provide health information to patients; however, the readability of these outputs compared to ophthalmologist-written responses is unknown. This study aimed to evaluate the readability of GPT-4-generated and ophthalmologist-written responses to patient queries about ophthalmic surgery.

Methods: This retrospective cross-sectional study used 200 randomly selected patient questions about ophthalmic surgery extracted from the American Academy of Ophthalmology's EyeSmart platform. The questions were inputted into GPT-4, and the generated responses were recorded. Ophthalmologist-written replies to the same questions were compiled for comparison. Readability of GPT-4 and ophthalmologist responses was assessed using six validated metrics: Flesch Kincaid Reading Ease (FK-RE), Flesch Kincaid Grade Level (FK-GL), Gunning Fog Score (GFS), SMOG Index (SI), Coleman Liau Index (CLI), and Automated Readability Index (ARI). Descriptive statistics, one-way ANOVA, Shapiro-Wilk, and Levene's tests (α = 0.05) were used to compare readability between the two groups.

Results: GPT-4 used a higher percentage of complex words (24.42%) compared to ophthalmologists (17.76%), although mean (standard deviation) word count per sentence was similar (18.43 [2.95] and 18.01 [6.09]). Across all metrics (FK-RE; FK-GL; GFS; SI; CLI; and ARI), GPT-4 responses were at a higher grade level (34.39 [8.51]; 13.19 [2.63]; 16.37 [2.04]; 12.18 [1.43]; 15.72 [1.40]; 12.99 [1.86]) than ophthalmologists' responses (50.61 [15.53]; 10.71 [2.99]; 14.13 [3.55]; 10.07 [2.46]; 12.64 [2.93]; 10.40 [3.61]), with both sources necessitating a 12th-grade education for comprehension. ANOVA tests showed significance (p < 0.05) for all comparisons except word count (p = 0.438).

Conclusion: The National Institutes of Health advises health information to be written at a 6th- to 7th-grade level. Both GPT-4- and ophthalmologist-written answers exceeded this recommendation, with GPT-4 showing a greater gap. Information accessibility is vital when designing patient resources, particularly with the rise of AI as an educational tool.

查看原文本刊更多论文

文本上的眼睛：评估人工智能的可读性&眼科医生对患者手术查询的回应。

简介：GPT-4等生成式人工智能（AI）技术可以即时为患者提供健康信息；然而，与眼科医生书面答复相比，这些输出的可读性尚不清楚。本研究旨在评估gpt -4生成的和眼科医生对患者眼科手术询问的回复的可读性。方法：本回顾性横断面研究从美国眼科学会的EyeSmart平台上随机抽取200例患者关于眼科手术的问题。这些问题被输入GPT-4，生成的回答被记录下来。将眼科医生对相同问题的书面答复进行汇编以进行比较。GPT-4的可读性和眼科医生的反应使用6个有效的指标进行评估：Flesch Kincaid Reading Ease （FK-RE）、Flesch Kincaid Grade Level （FK-GL）、Gunning Fog评分（GFS）、SMOG指数（SI）、Coleman Liau指数（CLI）和自动可读性指数（ARI）。采用描述性统计、单因素方差分析、Shapiro-Wilk检验和Levene’s检验（α=0.05）比较两组间的可读性。结果：GPT-4使用复杂词的比例（24.42%）高于眼科医生（17.76%），尽管平均[SD]每句字数相似（18.43[2.95]和18.01[6.09]）。通过所有指标(FK-RE；FK-GL;GFS;如果;CLI;和ARI)， GPT-4反应处于较高的分级水平(34.39 [8.51]；13.19 (2.63);16.37 (2.04);12.18 (1.43);15.72 (1.40);12.99[1.86])高于眼科医生（50.61 [15.53]）；10.71 (2.99);14.13 (3.55);10.07 (2.46);12.64 (2.93);10.40[3.61])，而这两种资源都需要接受12年级的理解教育。结论：美国国立卫生研究院建议健康信息应在六年级至七年级的水平上书写。GPT-4和眼科医生写的答案都超过了这个建议，GPT-4显示出更大的差距。在设计患者资源时，信息可访问性至关重要，特别是随着人工智能作为教育工具的兴起。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Ophthalmologica 医学-眼科学

CiteScore

5.10

自引率

3.80%

发文量

审稿时长

3 months

期刊介绍： Published since 1899, ''Ophthalmologica'' has become a frequently cited guide to international work in clinical and experimental ophthalmology. It contains a selection of patient-oriented contributions covering the etiology of eye diseases, diagnostic techniques, and advances in medical and surgical treatment. Straightforward, factual reporting provides both interesting and useful reading. In addition to original papers, ''Ophthalmologica'' features regularly timely reviews in an effort to keep the reader well informed and updated. The large international circulation of this journal reflects its importance.