评估人工智能生成的网膜内皮角膜移植术患者小叶的准确性、可读性和可靠性。

IF 1.4 4区 医学 Q3 OPHTHALMOLOGY
Bnar Massraf, Ka Kiu Cheris Chan, Nikhil Jain, Jesse Panthagani
{"title":"评估人工智能生成的网膜内皮角膜移植术患者小叶的准确性、可读性和可靠性。","authors":"Bnar Massraf, Ka Kiu Cheris Chan, Nikhil Jain, Jesse Panthagani","doi":"10.1177/11206721251367562","DOIUrl":null,"url":null,"abstract":"<p><p>PurposeThis study assessed the readability, reliability and accuracy of patient information leaflets on Descemet Membrane Endothelial Keratoplasty (DMEK), generated by seven large language models (LLMs). The aim was to determine which LLM produced the most patient-friendly, comprehensible and evidence-based leaflet, measured against a leaflet written by clinicians from a tertiary centre.MethodsEach LLM was given the prompt, \"Make a patient information leaflet on Descemet Membrane Endothelial Keratoplasty (DMEK) surgery.\" Readability metrics (FKG, FRE, ARI, Gunning Fog), reliability metrics (DISCERN, PEMAT), misinformation detection and reference analysis were recorded for each response. A weighted scoring system normalised results on a 0-100% scale.ResultsThe clinician-generated leaflet scored the highest (92%). Claude 3.7 Sonnet had the top LLM score (77.8%), with strong readability and referencing. ChatGPT-4o followed closely (70.9%) but lacked references. Moderate scores for DeepSeek-V3, Perplexity AI and Google Gemini 2.0 Flash. ChatGPT-4 and Microsoft CoPilot scored the lowest due to limited reliability and misinformation.ConclusionsLLMs show promise in generating patient education material but vary in reliability and accuracy. Claude 3.7 Sonnet was the best performing LLM, though none matched in quality to the clinician-generated leaflet. LLM-generated leaflets therefore require clinician oversight before safe clinical use.</p>","PeriodicalId":12000,"journal":{"name":"European Journal of Ophthalmology","volume":" ","pages":"11206721251367562"},"PeriodicalIF":1.4000,"publicationDate":"2025-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Assessing accuracy, readability & reliability of AI-generated patient leaflets on Descemet membrane endothelial keratoplasty.\",\"authors\":\"Bnar Massraf, Ka Kiu Cheris Chan, Nikhil Jain, Jesse Panthagani\",\"doi\":\"10.1177/11206721251367562\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>PurposeThis study assessed the readability, reliability and accuracy of patient information leaflets on Descemet Membrane Endothelial Keratoplasty (DMEK), generated by seven large language models (LLMs). The aim was to determine which LLM produced the most patient-friendly, comprehensible and evidence-based leaflet, measured against a leaflet written by clinicians from a tertiary centre.MethodsEach LLM was given the prompt, \\\"Make a patient information leaflet on Descemet Membrane Endothelial Keratoplasty (DMEK) surgery.\\\" Readability metrics (FKG, FRE, ARI, Gunning Fog), reliability metrics (DISCERN, PEMAT), misinformation detection and reference analysis were recorded for each response. A weighted scoring system normalised results on a 0-100% scale.ResultsThe clinician-generated leaflet scored the highest (92%). Claude 3.7 Sonnet had the top LLM score (77.8%), with strong readability and referencing. ChatGPT-4o followed closely (70.9%) but lacked references. Moderate scores for DeepSeek-V3, Perplexity AI and Google Gemini 2.0 Flash. ChatGPT-4 and Microsoft CoPilot scored the lowest due to limited reliability and misinformation.ConclusionsLLMs show promise in generating patient education material but vary in reliability and accuracy. Claude 3.7 Sonnet was the best performing LLM, though none matched in quality to the clinician-generated leaflet. LLM-generated leaflets therefore require clinician oversight before safe clinical use.</p>\",\"PeriodicalId\":12000,\"journal\":{\"name\":\"European Journal of Ophthalmology\",\"volume\":\" \",\"pages\":\"11206721251367562\"},\"PeriodicalIF\":1.4000,\"publicationDate\":\"2025-08-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"European Journal of Ophthalmology\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1177/11206721251367562\",\"RegionNum\":4,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"OPHTHALMOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"European Journal of Ophthalmology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1177/11206721251367562","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"OPHTHALMOLOGY","Score":null,"Total":0}
引用次数: 0

摘要

目的:本研究评估由7个大型语言模型(LLMs)生成的Descemet膜内皮角膜移植术(DMEK)患者信息小页的可读性、可靠性和准确性。目的是确定哪个法学硕士制作了最病人友好的,可理解的和基于证据的传单,与来自三级中心的临床医生编写的传单进行比较。方法要求每位LLM“制作角膜内皮成形术(DMEK)患者资料单张”。记录每个回答的可读性指标(FKG、FRE、ARI、Gunning Fog)、可靠性指标(DISCERN、PEMAT)、错误信息检测和参考分析。一个加权评分系统在0-100%的范围内将结果标准化。结果临床自制单张评分最高(92%)。Claude 3.7 Sonnet的LLM得分最高(77.8%),可读性和参考性强。chatgpt - 40紧随其后(70.9%),但缺乏参考文献。DeepSeek-V3、Perplexity AI和谷歌Gemini 2.0 Flash得分中等。由于可靠性有限和错误信息,ChatGPT-4和微软CoPilot得分最低。结论sllms在制作患者教育材料方面具有良好的前景,但在可靠性和准确性方面存在差异。克劳德3.7十四行诗是表现最好的法学硕士,尽管没有一个在质量上与临床医生生成的传单相匹配。因此,llm生成的传单在安全临床使用之前需要临床医生的监督。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Assessing accuracy, readability & reliability of AI-generated patient leaflets on Descemet membrane endothelial keratoplasty.

PurposeThis study assessed the readability, reliability and accuracy of patient information leaflets on Descemet Membrane Endothelial Keratoplasty (DMEK), generated by seven large language models (LLMs). The aim was to determine which LLM produced the most patient-friendly, comprehensible and evidence-based leaflet, measured against a leaflet written by clinicians from a tertiary centre.MethodsEach LLM was given the prompt, "Make a patient information leaflet on Descemet Membrane Endothelial Keratoplasty (DMEK) surgery." Readability metrics (FKG, FRE, ARI, Gunning Fog), reliability metrics (DISCERN, PEMAT), misinformation detection and reference analysis were recorded for each response. A weighted scoring system normalised results on a 0-100% scale.ResultsThe clinician-generated leaflet scored the highest (92%). Claude 3.7 Sonnet had the top LLM score (77.8%), with strong readability and referencing. ChatGPT-4o followed closely (70.9%) but lacked references. Moderate scores for DeepSeek-V3, Perplexity AI and Google Gemini 2.0 Flash. ChatGPT-4 and Microsoft CoPilot scored the lowest due to limited reliability and misinformation.ConclusionsLLMs show promise in generating patient education material but vary in reliability and accuracy. Claude 3.7 Sonnet was the best performing LLM, though none matched in quality to the clinician-generated leaflet. LLM-generated leaflets therefore require clinician oversight before safe clinical use.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
3.60
自引率
0.00%
发文量
372
审稿时长
3-8 weeks
期刊介绍: The European Journal of Ophthalmology was founded in 1991 and is issued in print bi-monthly. It publishes only peer-reviewed original research reporting clinical observations and laboratory investigations with clinical relevance focusing on new diagnostic and surgical techniques, instrument and therapy updates, results of clinical trials and research findings.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信