Bnar Massraf, Ka Kiu Cheris Chan, Nikhil Jain, Jesse Panthagani
{"title":"Assessing accuracy, readability & reliability of AI-generated patient leaflets on Descemet membrane endothelial keratoplasty.","authors":"Bnar Massraf, Ka Kiu Cheris Chan, Nikhil Jain, Jesse Panthagani","doi":"10.1177/11206721251367562","DOIUrl":null,"url":null,"abstract":"<p><p>PurposeThis study assessed the readability, reliability and accuracy of patient information leaflets on Descemet Membrane Endothelial Keratoplasty (DMEK), generated by seven large language models (LLMs). The aim was to determine which LLM produced the most patient-friendly, comprehensible and evidence-based leaflet, measured against a leaflet written by clinicians from a tertiary centre.MethodsEach LLM was given the prompt, \"Make a patient information leaflet on Descemet Membrane Endothelial Keratoplasty (DMEK) surgery.\" Readability metrics (FKG, FRE, ARI, Gunning Fog), reliability metrics (DISCERN, PEMAT), misinformation detection and reference analysis were recorded for each response. A weighted scoring system normalised results on a 0-100% scale.ResultsThe clinician-generated leaflet scored the highest (92%). Claude 3.7 Sonnet had the top LLM score (77.8%), with strong readability and referencing. ChatGPT-4o followed closely (70.9%) but lacked references. Moderate scores for DeepSeek-V3, Perplexity AI and Google Gemini 2.0 Flash. ChatGPT-4 and Microsoft CoPilot scored the lowest due to limited reliability and misinformation.ConclusionsLLMs show promise in generating patient education material but vary in reliability and accuracy. Claude 3.7 Sonnet was the best performing LLM, though none matched in quality to the clinician-generated leaflet. LLM-generated leaflets therefore require clinician oversight before safe clinical use.</p>","PeriodicalId":12000,"journal":{"name":"European Journal of Ophthalmology","volume":" ","pages":"11206721251367562"},"PeriodicalIF":1.4000,"publicationDate":"2025-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"European Journal of Ophthalmology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1177/11206721251367562","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"OPHTHALMOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
PurposeThis study assessed the readability, reliability and accuracy of patient information leaflets on Descemet Membrane Endothelial Keratoplasty (DMEK), generated by seven large language models (LLMs). The aim was to determine which LLM produced the most patient-friendly, comprehensible and evidence-based leaflet, measured against a leaflet written by clinicians from a tertiary centre.MethodsEach LLM was given the prompt, "Make a patient information leaflet on Descemet Membrane Endothelial Keratoplasty (DMEK) surgery." Readability metrics (FKG, FRE, ARI, Gunning Fog), reliability metrics (DISCERN, PEMAT), misinformation detection and reference analysis were recorded for each response. A weighted scoring system normalised results on a 0-100% scale.ResultsThe clinician-generated leaflet scored the highest (92%). Claude 3.7 Sonnet had the top LLM score (77.8%), with strong readability and referencing. ChatGPT-4o followed closely (70.9%) but lacked references. Moderate scores for DeepSeek-V3, Perplexity AI and Google Gemini 2.0 Flash. ChatGPT-4 and Microsoft CoPilot scored the lowest due to limited reliability and misinformation.ConclusionsLLMs show promise in generating patient education material but vary in reliability and accuracy. Claude 3.7 Sonnet was the best performing LLM, though none matched in quality to the clinician-generated leaflet. LLM-generated leaflets therefore require clinician oversight before safe clinical use.
期刊介绍:
The European Journal of Ophthalmology was founded in 1991 and is issued in print bi-monthly. It publishes only peer-reviewed original research reporting clinical observations and laboratory investigations with clinical relevance focusing on new diagnostic and surgical techniques, instrument and therapy updates, results of clinical trials and research findings.