Assessing accuracy, readability & reliability of AI-generated patient leaflets on Descemet membrane endothelial keratoplasty.

IF 1.4 4区医学 Q3 OPHTHALMOLOGY

European Journal of Ophthalmology Pub Date : 2025-08-28 DOI:10.1177/11206721251367562

Bnar Massraf, Ka Kiu Cheris Chan, Nikhil Jain, Jesse Panthagani

{"title":"Assessing accuracy, readability & reliability of AI-generated patient leaflets on Descemet membrane endothelial keratoplasty.","authors":"Bnar Massraf, Ka Kiu Cheris Chan, Nikhil Jain, Jesse Panthagani","doi":"10.1177/11206721251367562","DOIUrl":null,"url":null,"abstract":"<p><p>PurposeThis study assessed the readability, reliability and accuracy of patient information leaflets on Descemet Membrane Endothelial Keratoplasty (DMEK), generated by seven large language models (LLMs). The aim was to determine which LLM produced the most patient-friendly, comprehensible and evidence-based leaflet, measured against a leaflet written by clinicians from a tertiary centre.MethodsEach LLM was given the prompt, \"Make a patient information leaflet on Descemet Membrane Endothelial Keratoplasty (DMEK) surgery.\" Readability metrics (FKG, FRE, ARI, Gunning Fog), reliability metrics (DISCERN, PEMAT), misinformation detection and reference analysis were recorded for each response. A weighted scoring system normalised results on a 0-100% scale.ResultsThe clinician-generated leaflet scored the highest (92%). Claude 3.7 Sonnet had the top LLM score (77.8%), with strong readability and referencing. ChatGPT-4o followed closely (70.9%) but lacked references. Moderate scores for DeepSeek-V3, Perplexity AI and Google Gemini 2.0 Flash. ChatGPT-4 and Microsoft CoPilot scored the lowest due to limited reliability and misinformation.ConclusionsLLMs show promise in generating patient education material but vary in reliability and accuracy. Claude 3.7 Sonnet was the best performing LLM, though none matched in quality to the clinician-generated leaflet. LLM-generated leaflets therefore require clinician oversight before safe clinical use.</p>","PeriodicalId":12000,"journal":{"name":"European Journal of Ophthalmology","volume":" ","pages":"11206721251367562"},"PeriodicalIF":1.4000,"publicationDate":"2025-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"European Journal of Ophthalmology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1177/11206721251367562","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"OPHTHALMOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

PurposeThis study assessed the readability, reliability and accuracy of patient information leaflets on Descemet Membrane Endothelial Keratoplasty (DMEK), generated by seven large language models (LLMs). The aim was to determine which LLM produced the most patient-friendly, comprehensible and evidence-based leaflet, measured against a leaflet written by clinicians from a tertiary centre.MethodsEach LLM was given the prompt, "Make a patient information leaflet on Descemet Membrane Endothelial Keratoplasty (DMEK) surgery." Readability metrics (FKG, FRE, ARI, Gunning Fog), reliability metrics (DISCERN, PEMAT), misinformation detection and reference analysis were recorded for each response. A weighted scoring system normalised results on a 0-100% scale.ResultsThe clinician-generated leaflet scored the highest (92%). Claude 3.7 Sonnet had the top LLM score (77.8%), with strong readability and referencing. ChatGPT-4o followed closely (70.9%) but lacked references. Moderate scores for DeepSeek-V3, Perplexity AI and Google Gemini 2.0 Flash. ChatGPT-4 and Microsoft CoPilot scored the lowest due to limited reliability and misinformation.ConclusionsLLMs show promise in generating patient education material but vary in reliability and accuracy. Claude 3.7 Sonnet was the best performing LLM, though none matched in quality to the clinician-generated leaflet. LLM-generated leaflets therefore require clinician oversight before safe clinical use.

查看原文本刊更多论文

评估人工智能生成的网膜内皮角膜移植术患者小叶的准确性、可读性和可靠性。

目的：本研究评估由7个大型语言模型（LLMs）生成的Descemet膜内皮角膜移植术（DMEK）患者信息小页的可读性、可靠性和准确性。目的是确定哪个法学硕士制作了最病人友好的，可理解的和基于证据的传单，与来自三级中心的临床医生编写的传单进行比较。方法要求每位LLM“制作角膜内皮成形术（DMEK）患者资料单张”。记录每个回答的可读性指标（FKG、FRE、ARI、Gunning Fog）、可靠性指标（DISCERN、PEMAT）、错误信息检测和参考分析。一个加权评分系统在0-100%的范围内将结果标准化。结果临床自制单张评分最高（92%）。Claude 3.7 Sonnet的LLM得分最高（77.8%），可读性和参考性强。chatgpt - 40紧随其后（70.9%），但缺乏参考文献。DeepSeek-V3、Perplexity AI和谷歌Gemini 2.0 Flash得分中等。由于可靠性有限和错误信息，ChatGPT-4和微软CoPilot得分最低。结论sllms在制作患者教育材料方面具有良好的前景，但在可靠性和准确性方面存在差异。克劳德3.7十四行诗是表现最好的法学硕士，尽管没有一个在质量上与临床医生生成的传单相匹配。因此，llm生成的传单在安全临床使用之前需要临床医生的监督。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

European Journal of Ophthalmology 医学-眼科学

CiteScore

3.60

自引率

0.00%

发文量

372

审稿时长

3-8 weeks

期刊介绍： The European Journal of Ophthalmology was founded in 1991 and is issued in print bi-monthly. It publishes only peer-reviewed original research reporting clinical observations and laboratory investigations with clinical relevance focusing on new diagnostic and surgical techniques, instrument and therapy updates, results of clinical trials and research findings.