Bnar Massraf, Ka Kiu Cheris Chan, Nikhil Jain, Jesse Panthagani
{"title":"评估人工智能生成的网膜内皮角膜移植术患者小叶的准确性、可读性和可靠性。","authors":"Bnar Massraf, Ka Kiu Cheris Chan, Nikhil Jain, Jesse Panthagani","doi":"10.1177/11206721251367562","DOIUrl":null,"url":null,"abstract":"<p><p>PurposeThis study assessed the readability, reliability and accuracy of patient information leaflets on Descemet Membrane Endothelial Keratoplasty (DMEK), generated by seven large language models (LLMs). The aim was to determine which LLM produced the most patient-friendly, comprehensible and evidence-based leaflet, measured against a leaflet written by clinicians from a tertiary centre.MethodsEach LLM was given the prompt, \"Make a patient information leaflet on Descemet Membrane Endothelial Keratoplasty (DMEK) surgery.\" Readability metrics (FKG, FRE, ARI, Gunning Fog), reliability metrics (DISCERN, PEMAT), misinformation detection and reference analysis were recorded for each response. A weighted scoring system normalised results on a 0-100% scale.ResultsThe clinician-generated leaflet scored the highest (92%). Claude 3.7 Sonnet had the top LLM score (77.8%), with strong readability and referencing. ChatGPT-4o followed closely (70.9%) but lacked references. Moderate scores for DeepSeek-V3, Perplexity AI and Google Gemini 2.0 Flash. ChatGPT-4 and Microsoft CoPilot scored the lowest due to limited reliability and misinformation.ConclusionsLLMs show promise in generating patient education material but vary in reliability and accuracy. Claude 3.7 Sonnet was the best performing LLM, though none matched in quality to the clinician-generated leaflet. LLM-generated leaflets therefore require clinician oversight before safe clinical use.</p>","PeriodicalId":12000,"journal":{"name":"European Journal of Ophthalmology","volume":" ","pages":"11206721251367562"},"PeriodicalIF":1.4000,"publicationDate":"2025-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Assessing accuracy, readability & reliability of AI-generated patient leaflets on Descemet membrane endothelial keratoplasty.\",\"authors\":\"Bnar Massraf, Ka Kiu Cheris Chan, Nikhil Jain, Jesse Panthagani\",\"doi\":\"10.1177/11206721251367562\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>PurposeThis study assessed the readability, reliability and accuracy of patient information leaflets on Descemet Membrane Endothelial Keratoplasty (DMEK), generated by seven large language models (LLMs). The aim was to determine which LLM produced the most patient-friendly, comprehensible and evidence-based leaflet, measured against a leaflet written by clinicians from a tertiary centre.MethodsEach LLM was given the prompt, \\\"Make a patient information leaflet on Descemet Membrane Endothelial Keratoplasty (DMEK) surgery.\\\" Readability metrics (FKG, FRE, ARI, Gunning Fog), reliability metrics (DISCERN, PEMAT), misinformation detection and reference analysis were recorded for each response. A weighted scoring system normalised results on a 0-100% scale.ResultsThe clinician-generated leaflet scored the highest (92%). Claude 3.7 Sonnet had the top LLM score (77.8%), with strong readability and referencing. ChatGPT-4o followed closely (70.9%) but lacked references. Moderate scores for DeepSeek-V3, Perplexity AI and Google Gemini 2.0 Flash. ChatGPT-4 and Microsoft CoPilot scored the lowest due to limited reliability and misinformation.ConclusionsLLMs show promise in generating patient education material but vary in reliability and accuracy. Claude 3.7 Sonnet was the best performing LLM, though none matched in quality to the clinician-generated leaflet. LLM-generated leaflets therefore require clinician oversight before safe clinical use.</p>\",\"PeriodicalId\":12000,\"journal\":{\"name\":\"European Journal of Ophthalmology\",\"volume\":\" \",\"pages\":\"11206721251367562\"},\"PeriodicalIF\":1.4000,\"publicationDate\":\"2025-08-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"European Journal of Ophthalmology\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1177/11206721251367562\",\"RegionNum\":4,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"OPHTHALMOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"European Journal of Ophthalmology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1177/11206721251367562","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"OPHTHALMOLOGY","Score":null,"Total":0}
Assessing accuracy, readability & reliability of AI-generated patient leaflets on Descemet membrane endothelial keratoplasty.
PurposeThis study assessed the readability, reliability and accuracy of patient information leaflets on Descemet Membrane Endothelial Keratoplasty (DMEK), generated by seven large language models (LLMs). The aim was to determine which LLM produced the most patient-friendly, comprehensible and evidence-based leaflet, measured against a leaflet written by clinicians from a tertiary centre.MethodsEach LLM was given the prompt, "Make a patient information leaflet on Descemet Membrane Endothelial Keratoplasty (DMEK) surgery." Readability metrics (FKG, FRE, ARI, Gunning Fog), reliability metrics (DISCERN, PEMAT), misinformation detection and reference analysis were recorded for each response. A weighted scoring system normalised results on a 0-100% scale.ResultsThe clinician-generated leaflet scored the highest (92%). Claude 3.7 Sonnet had the top LLM score (77.8%), with strong readability and referencing. ChatGPT-4o followed closely (70.9%) but lacked references. Moderate scores for DeepSeek-V3, Perplexity AI and Google Gemini 2.0 Flash. ChatGPT-4 and Microsoft CoPilot scored the lowest due to limited reliability and misinformation.ConclusionsLLMs show promise in generating patient education material but vary in reliability and accuracy. Claude 3.7 Sonnet was the best performing LLM, though none matched in quality to the clinician-generated leaflet. LLM-generated leaflets therefore require clinician oversight before safe clinical use.
期刊介绍:
The European Journal of Ophthalmology was founded in 1991 and is issued in print bi-monthly. It publishes only peer-reviewed original research reporting clinical observations and laboratory investigations with clinical relevance focusing on new diagnostic and surgical techniques, instrument and therapy updates, results of clinical trials and research findings.