Evaluating Text-to-Image Generation in Pediatric Ophthalmology.

IF 0.9 4区医学 Q4 OPHTHALMOLOGY

Journal of Pediatric Ophthalmology & Strabismus Pub Date : 2025-09-26 DOI:10.3928/01913913-20250724-03

Sarah Jong, Qais A Dihan, Mohamed M Khodeiry, Ahmad Alzein, Christina Scelfo, Abdelrahman M Elhusseiny

{"title":"Evaluating Text-to-Image Generation in Pediatric Ophthalmology.","authors":"Sarah Jong, Qais A Dihan, Mohamed M Khodeiry, Ahmad Alzein, Christina Scelfo, Abdelrahman M Elhusseiny","doi":"10.3928/01913913-20250724-03","DOIUrl":null,"url":null,"abstract":"Purpose: To evaluate the quality and accuracy of artificial intelligence (AI)-generated images depicting pediatric ophthalmology pathologies compared to human-illustrated images, and assess the readability, quality, and accuracy of accompanying AI-generated textual information.Methods: This cross-sectional comparative study analyzed outputs from DALL·E 3 (OpenAI) and Gemini Advanced (Google). Nine pediatric ophthalmology pathologies were sourced from the American Association for Pediatric Ophthalmology and Strabismus (AAPOS) \"Most Common Searches.\" Two prompts were used: Prompt A asked large language models (LLMs), \"What is [insert pathology]?\" Prompt B requested text-to-image generators (TTIs) to create images of the pathologies. Textual responses were evaluated for quality using published criteria (helpfulness, truthfulness, harmlessness; score 1 to 15, ≥ 12: high quality) and readability using Simple Measure of Gobbledygook (SMOG) and Flesch-Kincaid Grade Level (≤ 6th-grade level: readable). Images were assessed for anatomical accuracy, pathological accuracy, artifacts, and color (score 1 to 15, ≥ 12: high quality). Human-illustrated images served as controls.Results: DALL·E 3 images were of poor quality (median: 7; range: 3 to 15) and significantly worse than human-illustrated controls (median: 15; range: 9 to 15; P < .001). Pathological accuracy was also poor (median: 1). Textual information from ChatGPT-4o and Gemini Advanced was high quality (median: 15) but difficult to read (Chat-GPT-4o: SMOG: 8.2, FKGL: 8.9; Gemini Advanced: SMOG: 8.5, FKGL: 9.3).Conclusions: Text-to-image generators are poor at generating images of common pediatric ophthalmology pathologies. They can serve as adequate supplemental tools for generating high-quality accurate textual information, but care must be taken to tailor generated text to be readable by users.","PeriodicalId":50095,"journal":{"name":"Journal of Pediatric Ophthalmology & Strabismus","volume":" ","pages":"1-7"},"PeriodicalIF":0.9000,"publicationDate":"2025-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Pediatric Ophthalmology & Strabismus","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.3928/01913913-20250724-03","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"OPHTHALMOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Purpose: To evaluate the quality and accuracy of artificial intelligence (AI)-generated images depicting pediatric ophthalmology pathologies compared to human-illustrated images, and assess the readability, quality, and accuracy of accompanying AI-generated textual information.

Methods: This cross-sectional comparative study analyzed outputs from DALL·E 3 (OpenAI) and Gemini Advanced (Google). Nine pediatric ophthalmology pathologies were sourced from the American Association for Pediatric Ophthalmology and Strabismus (AAPOS) "Most Common Searches." Two prompts were used: Prompt A asked large language models (LLMs), "What is [insert pathology]?" Prompt B requested text-to-image generators (TTIs) to create images of the pathologies. Textual responses were evaluated for quality using published criteria (helpfulness, truthfulness, harmlessness; score 1 to 15, ≥ 12: high quality) and readability using Simple Measure of Gobbledygook (SMOG) and Flesch-Kincaid Grade Level (≤ 6th-grade level: readable). Images were assessed for anatomical accuracy, pathological accuracy, artifacts, and color (score 1 to 15, ≥ 12: high quality). Human-illustrated images served as controls.

Results: DALL·E 3 images were of poor quality (median: 7; range: 3 to 15) and significantly worse than human-illustrated controls (median: 15; range: 9 to 15; P < .001). Pathological accuracy was also poor (median: 1). Textual information from ChatGPT-4o and Gemini Advanced was high quality (median: 15) but difficult to read (Chat-GPT-4o: SMOG: 8.2, FKGL: 8.9; Gemini Advanced: SMOG: 8.5, FKGL: 9.3).

Conclusions: Text-to-image generators are poor at generating images of common pediatric ophthalmology pathologies. They can serve as adequate supplemental tools for generating high-quality accurate textual information, but care must be taken to tailor generated text to be readable by users.

查看原文本刊更多论文

评估儿童眼科的文本-图像生成。

目的：评估人工智能（AI）生成的儿童眼科病理图像与人类插图图像的质量和准确性，并评估人工智能生成的文本信息的可读性、质量和准确性。方法：本横断面比较研究分析了DALL·e3 （OpenAI）和Gemini Advanced（谷歌）的输出。九种儿童眼科病理来自美国儿童眼科和斜视协会（AAPOS）的“最常见搜索”。使用了两个提示：提示A询问大型语言模型（llm），“什么是[插入病理学]?”提示B请求文本到图像生成器（tti）来创建病理图像。使用公布的标准（有用性、真实性、无害性；得分1至15分，≥12分：高质量）评估文本回复的质量，使用简单测量的Gobbledygook （SMOG）和Flesch-Kincaid等级水平（≤6级水平：可读）评估文本回复的可读性。评估图像的解剖准确性、病理准确性、伪影和颜色（评分1至15分，≥12分：高质量）。人类插图作为对照。结果：DALL·e3图像质量较差（中位数：7；范围：3 ~ 15），明显低于人类插图对照（中位数：15；范围：9 ~ 15；P < 0.001）。病理准确性也较差（中位数：1）。chatgpt - 40和Gemini Advanced的文本信息质量高（中位数：15），但难以阅读（chatgpt - 40: SMOG: 8.2, FKGL: 8.9; Gemini Advanced: SMOG: 8.5, FKGL: 9.3）。结论：文本-图像生成器在生成常见儿科眼科病理图像方面效果较差。它们可以作为生成高质量的准确文本信息的适当补充工具，但是必须注意调整生成的文本，使其适合用户阅读。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Pediatric Ophthalmology & Strabismus 医学-小儿科

CiteScore

1.80

自引率

8.30%

发文量

115

审稿时长

>12 weeks

期刊介绍： The Journal of Pediatric Ophthalmology & Strabismus is a bimonthly peer-reviewed publication for pediatric ophthalmologists. The Journal has published original articles on the diagnosis, treatment, and prevention of eye disorders in the pediatric age group and the treatment of strabismus in all age groups for over 50 years.