评估儿童眼科的文本-图像生成。

IF 0.9 4区 医学 Q4 OPHTHALMOLOGY
Sarah Jong, Qais A Dihan, Mohamed M Khodeiry, Ahmad Alzein, Christina Scelfo, Abdelrahman M Elhusseiny
{"title":"评估儿童眼科的文本-图像生成。","authors":"Sarah Jong, Qais A Dihan, Mohamed M Khodeiry, Ahmad Alzein, Christina Scelfo, Abdelrahman M Elhusseiny","doi":"10.3928/01913913-20250724-03","DOIUrl":null,"url":null,"abstract":"<p><strong>Purpose: </strong>To evaluate the quality and accuracy of artificial intelligence (AI)-generated images depicting pediatric ophthalmology pathologies compared to human-illustrated images, and assess the readability, quality, and accuracy of accompanying AI-generated textual information.</p><p><strong>Methods: </strong>This cross-sectional comparative study analyzed outputs from DALL·E 3 (OpenAI) and Gemini Advanced (Google). Nine pediatric ophthalmology pathologies were sourced from the American Association for Pediatric Ophthalmology and Strabismus (AAPOS) \"Most Common Searches.\" Two prompts were used: Prompt A asked large language models (LLMs), \"What is [insert pathology]?\" Prompt B requested text-to-image generators (TTIs) to create images of the pathologies. Textual responses were evaluated for quality using published criteria (helpfulness, truthfulness, harmlessness; score 1 to 15, ≥ 12: high quality) and readability using Simple Measure of Gobbledygook (SMOG) and Flesch-Kincaid Grade Level (≤ 6th-grade level: readable). Images were assessed for anatomical accuracy, pathological accuracy, artifacts, and color (score 1 to 15, ≥ 12: high quality). Human-illustrated images served as controls.</p><p><strong>Results: </strong>DALL·E 3 images were of poor quality (median: 7; range: 3 to 15) and significantly worse than human-illustrated controls (median: 15; range: 9 to 15; <i>P</i> < .001). Pathological accuracy was also poor (median: 1). Textual information from ChatGPT-4o and Gemini Advanced was high quality (median: 15) but difficult to read (Chat-GPT-4o: SMOG: 8.2, FKGL: 8.9; Gemini Advanced: SMOG: 8.5, FKGL: 9.3).</p><p><strong>Conclusions: </strong>Text-to-image generators are poor at generating images of common pediatric ophthalmology pathologies. They can serve as adequate supplemental tools for generating high-quality accurate textual information, but care must be taken to tailor generated text to be readable by users.</p>","PeriodicalId":50095,"journal":{"name":"Journal of Pediatric Ophthalmology & Strabismus","volume":" ","pages":"1-7"},"PeriodicalIF":0.9000,"publicationDate":"2025-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Evaluating Text-to-Image Generation in Pediatric Ophthalmology.\",\"authors\":\"Sarah Jong, Qais A Dihan, Mohamed M Khodeiry, Ahmad Alzein, Christina Scelfo, Abdelrahman M Elhusseiny\",\"doi\":\"10.3928/01913913-20250724-03\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Purpose: </strong>To evaluate the quality and accuracy of artificial intelligence (AI)-generated images depicting pediatric ophthalmology pathologies compared to human-illustrated images, and assess the readability, quality, and accuracy of accompanying AI-generated textual information.</p><p><strong>Methods: </strong>This cross-sectional comparative study analyzed outputs from DALL·E 3 (OpenAI) and Gemini Advanced (Google). Nine pediatric ophthalmology pathologies were sourced from the American Association for Pediatric Ophthalmology and Strabismus (AAPOS) \\\"Most Common Searches.\\\" Two prompts were used: Prompt A asked large language models (LLMs), \\\"What is [insert pathology]?\\\" Prompt B requested text-to-image generators (TTIs) to create images of the pathologies. Textual responses were evaluated for quality using published criteria (helpfulness, truthfulness, harmlessness; score 1 to 15, ≥ 12: high quality) and readability using Simple Measure of Gobbledygook (SMOG) and Flesch-Kincaid Grade Level (≤ 6th-grade level: readable). Images were assessed for anatomical accuracy, pathological accuracy, artifacts, and color (score 1 to 15, ≥ 12: high quality). Human-illustrated images served as controls.</p><p><strong>Results: </strong>DALL·E 3 images were of poor quality (median: 7; range: 3 to 15) and significantly worse than human-illustrated controls (median: 15; range: 9 to 15; <i>P</i> < .001). Pathological accuracy was also poor (median: 1). Textual information from ChatGPT-4o and Gemini Advanced was high quality (median: 15) but difficult to read (Chat-GPT-4o: SMOG: 8.2, FKGL: 8.9; Gemini Advanced: SMOG: 8.5, FKGL: 9.3).</p><p><strong>Conclusions: </strong>Text-to-image generators are poor at generating images of common pediatric ophthalmology pathologies. They can serve as adequate supplemental tools for generating high-quality accurate textual information, but care must be taken to tailor generated text to be readable by users.</p>\",\"PeriodicalId\":50095,\"journal\":{\"name\":\"Journal of Pediatric Ophthalmology & Strabismus\",\"volume\":\" \",\"pages\":\"1-7\"},\"PeriodicalIF\":0.9000,\"publicationDate\":\"2025-09-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Pediatric Ophthalmology & Strabismus\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.3928/01913913-20250724-03\",\"RegionNum\":4,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"OPHTHALMOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Pediatric Ophthalmology & Strabismus","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.3928/01913913-20250724-03","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"OPHTHALMOLOGY","Score":null,"Total":0}
引用次数: 0

摘要

目的:评估人工智能(AI)生成的儿童眼科病理图像与人类插图图像的质量和准确性,并评估人工智能生成的文本信息的可读性、质量和准确性。方法:本横断面比较研究分析了DALL·e3 (OpenAI)和Gemini Advanced(谷歌)的输出。九种儿童眼科病理来自美国儿童眼科和斜视协会(AAPOS)的“最常见搜索”。使用了两个提示:提示A询问大型语言模型(llm),“什么是[插入病理学]?”提示B请求文本到图像生成器(tti)来创建病理图像。使用公布的标准(有用性、真实性、无害性;得分1至15分,≥12分:高质量)评估文本回复的质量,使用简单测量的Gobbledygook (SMOG)和Flesch-Kincaid等级水平(≤6级水平:可读)评估文本回复的可读性。评估图像的解剖准确性、病理准确性、伪影和颜色(评分1至15分,≥12分:高质量)。人类插图作为对照。结果:DALL·e3图像质量较差(中位数:7;范围:3 ~ 15),明显低于人类插图对照(中位数:15;范围:9 ~ 15;P < 0.001)。病理准确性也较差(中位数:1)。chatgpt - 40和Gemini Advanced的文本信息质量高(中位数:15),但难以阅读(chatgpt - 40: SMOG: 8.2, FKGL: 8.9; Gemini Advanced: SMOG: 8.5, FKGL: 9.3)。结论:文本-图像生成器在生成常见儿科眼科病理图像方面效果较差。它们可以作为生成高质量的准确文本信息的适当补充工具,但是必须注意调整生成的文本,使其适合用户阅读。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Evaluating Text-to-Image Generation in Pediatric Ophthalmology.

Purpose: To evaluate the quality and accuracy of artificial intelligence (AI)-generated images depicting pediatric ophthalmology pathologies compared to human-illustrated images, and assess the readability, quality, and accuracy of accompanying AI-generated textual information.

Methods: This cross-sectional comparative study analyzed outputs from DALL·E 3 (OpenAI) and Gemini Advanced (Google). Nine pediatric ophthalmology pathologies were sourced from the American Association for Pediatric Ophthalmology and Strabismus (AAPOS) "Most Common Searches." Two prompts were used: Prompt A asked large language models (LLMs), "What is [insert pathology]?" Prompt B requested text-to-image generators (TTIs) to create images of the pathologies. Textual responses were evaluated for quality using published criteria (helpfulness, truthfulness, harmlessness; score 1 to 15, ≥ 12: high quality) and readability using Simple Measure of Gobbledygook (SMOG) and Flesch-Kincaid Grade Level (≤ 6th-grade level: readable). Images were assessed for anatomical accuracy, pathological accuracy, artifacts, and color (score 1 to 15, ≥ 12: high quality). Human-illustrated images served as controls.

Results: DALL·E 3 images were of poor quality (median: 7; range: 3 to 15) and significantly worse than human-illustrated controls (median: 15; range: 9 to 15; P < .001). Pathological accuracy was also poor (median: 1). Textual information from ChatGPT-4o and Gemini Advanced was high quality (median: 15) but difficult to read (Chat-GPT-4o: SMOG: 8.2, FKGL: 8.9; Gemini Advanced: SMOG: 8.5, FKGL: 9.3).

Conclusions: Text-to-image generators are poor at generating images of common pediatric ophthalmology pathologies. They can serve as adequate supplemental tools for generating high-quality accurate textual information, but care must be taken to tailor generated text to be readable by users.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
1.80
自引率
8.30%
发文量
115
审稿时长
>12 weeks
期刊介绍: The Journal of Pediatric Ophthalmology & Strabismus is a bimonthly peer-reviewed publication for pediatric ophthalmologists. The Journal has published original articles on the diagnosis, treatment, and prevention of eye disorders in the pediatric age group and the treatment of strabismus in all age groups for over 50 years.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信