人工智能能否提高妇科患者教育信息的可读性?

IF 8.7 1区 医学 Q1 OBSTETRICS & GYNECOLOGY
Naveena R Daram, Rose A Maxwell, Josette D'Amato, Jason C Massengill
{"title":"人工智能能否提高妇科患者教育信息的可读性?","authors":"Naveena R Daram, Rose A Maxwell, Josette D'Amato, Jason C Massengill","doi":"10.1016/j.ajog.2025.06.047","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>The American Medical Association recommends that patient information be written at a 6<sup>th</sup> grade level to increase accessibility. However, most existing patient education materials exceed this threshold, posing challenges to patient comprehension. Artificial Intelligence (AI), particularly large language models (LLMs), presents an opportunity to improve the readability of medical information. Despite the growing integration of AI in healthcare, few studies have evaluated the effectiveness of LLMs in generating or improving readability of existing patient education materials within gynecology.</p><p><strong>Objective: </strong>To assess the readability and effectiveness of patient education materials generated by ChatGPT, Gemini, and CoPilot compared to American College of Obstetricians and Gynecologists (ACOG) and UpToDate.com. Additionally, to determine whether these LLMs can successfully adjust the reading level to a 6<sup>th</sup>-grade standard.</p><p><strong>Study design: </strong>This cross-sectional study analyzed ACOG, UpToDate, and LLM-generated content, evaluating LLMs for two tasks: (1) independent LLM-generated materials and (2) LLM-enhanced versions reducing existing patient information to 6<sup>th</sup>-grade level. All materials were assessed for basic textual analysis and readability using eight readability formulas. Two board-certified OBGYNs evaluated blinded patient education materials for accuracy, clarity, and comprehension. ANOVA was used to compare textual analysis and readability scores, with Tukey post-hoc tests identifying differences for both original and enhanced materials. An alpha threshold of p<.004 was used to account for multiple comparisons.</p><p><strong>Results: </strong>LLM-generated materials were significantly shorter (mean word count 407.9 vs. 1132.0; p<.001) but had a higher proportion of difficult words (36.7% vs. 27.4%; p<.001). ACOG and UpToDate materials averaged 9th and 8.6 grade levels respectively, while AI-generated content reached a 10.6 grade level (p=0.008). Although CoPilot and Gemini improved readability when prompted, no LLM successfully reached the 6th-grade benchmark, and ChatGPT increased reading difficulty.</p><p><strong>Conclusions: </strong>LLMs generated more concise patient education materials but often introduced more complex vocabulary, ultimately failing to meet recommended health literacy standards. Even when explicitly prompted, no LLM achieved the 6th-grade reading level required for optimal patient comprehension. Without proper oversight, AI-generated patient education materials may create the illusion of simplicity while reducing true accessibility. Future efforts should focus on integrating health literacy safeguards into AI models before clinical implementation.</p>","PeriodicalId":7574,"journal":{"name":"American journal of obstetrics and gynecology","volume":" ","pages":""},"PeriodicalIF":8.7000,"publicationDate":"2025-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Can AI Improve the Readability of Patient Education Information in Gynecology?\",\"authors\":\"Naveena R Daram, Rose A Maxwell, Josette D'Amato, Jason C Massengill\",\"doi\":\"10.1016/j.ajog.2025.06.047\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>The American Medical Association recommends that patient information be written at a 6<sup>th</sup> grade level to increase accessibility. However, most existing patient education materials exceed this threshold, posing challenges to patient comprehension. Artificial Intelligence (AI), particularly large language models (LLMs), presents an opportunity to improve the readability of medical information. Despite the growing integration of AI in healthcare, few studies have evaluated the effectiveness of LLMs in generating or improving readability of existing patient education materials within gynecology.</p><p><strong>Objective: </strong>To assess the readability and effectiveness of patient education materials generated by ChatGPT, Gemini, and CoPilot compared to American College of Obstetricians and Gynecologists (ACOG) and UpToDate.com. Additionally, to determine whether these LLMs can successfully adjust the reading level to a 6<sup>th</sup>-grade standard.</p><p><strong>Study design: </strong>This cross-sectional study analyzed ACOG, UpToDate, and LLM-generated content, evaluating LLMs for two tasks: (1) independent LLM-generated materials and (2) LLM-enhanced versions reducing existing patient information to 6<sup>th</sup>-grade level. All materials were assessed for basic textual analysis and readability using eight readability formulas. Two board-certified OBGYNs evaluated blinded patient education materials for accuracy, clarity, and comprehension. ANOVA was used to compare textual analysis and readability scores, with Tukey post-hoc tests identifying differences for both original and enhanced materials. An alpha threshold of p<.004 was used to account for multiple comparisons.</p><p><strong>Results: </strong>LLM-generated materials were significantly shorter (mean word count 407.9 vs. 1132.0; p<.001) but had a higher proportion of difficult words (36.7% vs. 27.4%; p<.001). ACOG and UpToDate materials averaged 9th and 8.6 grade levels respectively, while AI-generated content reached a 10.6 grade level (p=0.008). Although CoPilot and Gemini improved readability when prompted, no LLM successfully reached the 6th-grade benchmark, and ChatGPT increased reading difficulty.</p><p><strong>Conclusions: </strong>LLMs generated more concise patient education materials but often introduced more complex vocabulary, ultimately failing to meet recommended health literacy standards. Even when explicitly prompted, no LLM achieved the 6th-grade reading level required for optimal patient comprehension. Without proper oversight, AI-generated patient education materials may create the illusion of simplicity while reducing true accessibility. Future efforts should focus on integrating health literacy safeguards into AI models before clinical implementation.</p>\",\"PeriodicalId\":7574,\"journal\":{\"name\":\"American journal of obstetrics and gynecology\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":8.7000,\"publicationDate\":\"2025-06-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"American journal of obstetrics and gynecology\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1016/j.ajog.2025.06.047\",\"RegionNum\":1,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"OBSTETRICS & GYNECOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"American journal of obstetrics and gynecology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1016/j.ajog.2025.06.047","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"OBSTETRICS & GYNECOLOGY","Score":null,"Total":0}
引用次数: 0

摘要

背景:美国医学协会建议将患者信息写成六年级水平,以增加可访问性。然而,大多数现有的患者教育材料都超过了这个阈值,给患者的理解带来了挑战。人工智能(AI),特别是大型语言模型(llm),为提高医疗信息的可读性提供了机会。尽管人工智能在医疗保健领域的整合越来越多,但很少有研究评估法学硕士在生成或提高现有妇科患者教育材料可读性方面的有效性。目的:比较ChatGPT、Gemini和CoPilot生成的患者教育材料与美国妇产科学院(ACOG)和UpToDate.com的可读性和有效性。此外,确定这些法学硕士是否能够成功地将阅读水平调整到六年级标准。研究设计:本横断面研究分析了ACOG、UpToDate和llm生成的内容,评估llm的两个任务:(1)独立的llm生成的材料和(2)llm增强版本将现有患者信息减少到6年级水平。使用8个可读性公式评估所有材料的基本文本分析和可读性。两位委员会认证的妇产科医生评估盲法患者教育材料的准确性、清晰度和理解力。方差分析用于比较文本分析和可读性得分,Tukey事后检验确定原始材料和增强材料的差异。结果的alpha阈值:法学硕士生成的材料明显更短(平均字数407.9 vs. 1132.0;结论:法学硕士产生了更简洁的患者教育材料,但往往引入了更复杂的词汇,最终未能达到推荐的健康素养标准。即使在明确提示下,也没有法学硕士达到最佳患者理解所需的6年级阅读水平。如果没有适当的监督,人工智能生成的患者教育材料可能会造成简单的假象,同时降低真正的可及性。未来的工作应侧重于在临床实施之前将卫生素养保障措施整合到人工智能模型中。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Can AI Improve the Readability of Patient Education Information in Gynecology?

Background: The American Medical Association recommends that patient information be written at a 6th grade level to increase accessibility. However, most existing patient education materials exceed this threshold, posing challenges to patient comprehension. Artificial Intelligence (AI), particularly large language models (LLMs), presents an opportunity to improve the readability of medical information. Despite the growing integration of AI in healthcare, few studies have evaluated the effectiveness of LLMs in generating or improving readability of existing patient education materials within gynecology.

Objective: To assess the readability and effectiveness of patient education materials generated by ChatGPT, Gemini, and CoPilot compared to American College of Obstetricians and Gynecologists (ACOG) and UpToDate.com. Additionally, to determine whether these LLMs can successfully adjust the reading level to a 6th-grade standard.

Study design: This cross-sectional study analyzed ACOG, UpToDate, and LLM-generated content, evaluating LLMs for two tasks: (1) independent LLM-generated materials and (2) LLM-enhanced versions reducing existing patient information to 6th-grade level. All materials were assessed for basic textual analysis and readability using eight readability formulas. Two board-certified OBGYNs evaluated blinded patient education materials for accuracy, clarity, and comprehension. ANOVA was used to compare textual analysis and readability scores, with Tukey post-hoc tests identifying differences for both original and enhanced materials. An alpha threshold of p<.004 was used to account for multiple comparisons.

Results: LLM-generated materials were significantly shorter (mean word count 407.9 vs. 1132.0; p<.001) but had a higher proportion of difficult words (36.7% vs. 27.4%; p<.001). ACOG and UpToDate materials averaged 9th and 8.6 grade levels respectively, while AI-generated content reached a 10.6 grade level (p=0.008). Although CoPilot and Gemini improved readability when prompted, no LLM successfully reached the 6th-grade benchmark, and ChatGPT increased reading difficulty.

Conclusions: LLMs generated more concise patient education materials but often introduced more complex vocabulary, ultimately failing to meet recommended health literacy standards. Even when explicitly prompted, no LLM achieved the 6th-grade reading level required for optimal patient comprehension. Without proper oversight, AI-generated patient education materials may create the illusion of simplicity while reducing true accessibility. Future efforts should focus on integrating health literacy safeguards into AI models before clinical implementation.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
15.90
自引率
7.10%
发文量
2237
审稿时长
47 days
期刊介绍: The American Journal of Obstetrics and Gynecology, known as "The Gray Journal," covers the entire spectrum of Obstetrics and Gynecology. It aims to publish original research (clinical and translational), reviews, opinions, video clips, podcasts, and interviews that contribute to understanding health and disease and have the potential to impact the practice of women's healthcare. Focus Areas: Diagnosis, Treatment, Prediction, and Prevention: The journal focuses on research related to the diagnosis, treatment, prediction, and prevention of obstetrical and gynecological disorders. Biology of Reproduction: AJOG publishes work on the biology of reproduction, including studies on reproductive physiology and mechanisms of obstetrical and gynecological diseases. Content Types: Original Research: Clinical and translational research articles. Reviews: Comprehensive reviews providing insights into various aspects of obstetrics and gynecology. Opinions: Perspectives and opinions on important topics in the field. Multimedia Content: Video clips, podcasts, and interviews. Peer Review Process: All submissions undergo a rigorous peer review process to ensure quality and relevance to the field of obstetrics and gynecology.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信