Qais A Dihan, Andrew D Brown, Ana T Zaldivar, Muhammad Z Chauhan, Taher K Eleiwa, Amr K Hassan, Omar Solyman, Ryan Gise, Paul H Phillips, Ahmed B Sallam, Abdelrahman M Elhusseiny
{"title":"Advancing Patient Education in Idiopathic Intracranial Hypertension: The Promise of Large Language Models.","authors":"Qais A Dihan, Andrew D Brown, Ana T Zaldivar, Muhammad Z Chauhan, Taher K Eleiwa, Amr K Hassan, Omar Solyman, Ryan Gise, Paul H Phillips, Ahmed B Sallam, Abdelrahman M Elhusseiny","doi":"10.1212/CPJ.0000000000200366","DOIUrl":null,"url":null,"abstract":"<p><strong>Background and objectives: </strong>We evaluated the performance of 3 large language models (LLMs) in generating patient education materials (PEMs) and enhancing the readability of prewritten PEMs on idiopathic intracranial hypertension (IIH).</p><p><strong>Methods: </strong>This cross-sectional comparative study compared 3 LLMs, ChatGPT-3.5, ChatGPT-4, and Google Bard, for their ability to generate PEMs on IIH using 3 prompts. Prompt A (control prompt): \"Can you write a patient-targeted health information handout on idiopathic intracranial hypertension that is easily understandable by the average American?\", Prompt B (modifier statement + control prompt): \"Given patient education materials are recommended to be written at a 6th-grade reading level, using the SMOG readability formula, can you write a patient-targeted health information handout on idiopathic intracranial hypertension that is easily understandable by the average American?\", and Prompt C: \"Given patient education materials are recommended to be written at a 6th-grade reading level, using the SMOG readability formula, can you rewrite the following text to a 6th-grade reading level: [<i>insert text</i>].\" We compared generated and rewritten PEMs, along with the first 20 googled eligible PEMs on IIH, on readability (Simple Measure of Gobbledygook [SMOG] and Flesch-Kincaid Grade Level [FKGL]), quality (DISCERN and Patient Education Materials Assessment tool [PEMAT]), and accuracy (Likert misinformation scale).</p><p><strong>Results: </strong>Generated PEMs were of high quality, understandability, and accuracy (median DISCERN score ≥4, PEMAT understandability ≥70%, Likert misinformation scale = 1). Only ChatGPT-4 was able to generate PEMs at the specified 6th-grade reading level (SMOG: 5.5 ± 0.6, FKGL: 5.6 ± 0.7). Original published PEMs were rewritten to below a 6th-grade reading level with Prompt C, without a decrease in quality, understandability, or accuracy only by ChatGPT-4 (SMOG: 5.6 ± 0.6, FKGL: 5.7 ± 0.8, <i>p</i> < 0.001, DISCERN ≥4, Likert misinformation = 1).</p><p><strong>Discussion: </strong>In conclusion, LLMs, particularly ChatGPT-4, can produce high-quality, readable PEMs on IIH. They can also serve as supplementary tools to improve the readability of prewritten PEMs while maintaining quality and accuracy.</p>","PeriodicalId":19136,"journal":{"name":"Neurology. Clinical practice","volume":null,"pages":null},"PeriodicalIF":2.3000,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11464234/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neurology. Clinical practice","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1212/CPJ.0000000000200366","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/10/8 0:00:00","PubModel":"Epub","JCR":"Q3","JCRName":"CLINICAL NEUROLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Background and objectives: We evaluated the performance of 3 large language models (LLMs) in generating patient education materials (PEMs) and enhancing the readability of prewritten PEMs on idiopathic intracranial hypertension (IIH).
Methods: This cross-sectional comparative study compared 3 LLMs, ChatGPT-3.5, ChatGPT-4, and Google Bard, for their ability to generate PEMs on IIH using 3 prompts. Prompt A (control prompt): "Can you write a patient-targeted health information handout on idiopathic intracranial hypertension that is easily understandable by the average American?", Prompt B (modifier statement + control prompt): "Given patient education materials are recommended to be written at a 6th-grade reading level, using the SMOG readability formula, can you write a patient-targeted health information handout on idiopathic intracranial hypertension that is easily understandable by the average American?", and Prompt C: "Given patient education materials are recommended to be written at a 6th-grade reading level, using the SMOG readability formula, can you rewrite the following text to a 6th-grade reading level: [insert text]." We compared generated and rewritten PEMs, along with the first 20 googled eligible PEMs on IIH, on readability (Simple Measure of Gobbledygook [SMOG] and Flesch-Kincaid Grade Level [FKGL]), quality (DISCERN and Patient Education Materials Assessment tool [PEMAT]), and accuracy (Likert misinformation scale).
Results: Generated PEMs were of high quality, understandability, and accuracy (median DISCERN score ≥4, PEMAT understandability ≥70%, Likert misinformation scale = 1). Only ChatGPT-4 was able to generate PEMs at the specified 6th-grade reading level (SMOG: 5.5 ± 0.6, FKGL: 5.6 ± 0.7). Original published PEMs were rewritten to below a 6th-grade reading level with Prompt C, without a decrease in quality, understandability, or accuracy only by ChatGPT-4 (SMOG: 5.6 ± 0.6, FKGL: 5.7 ± 0.8, p < 0.001, DISCERN ≥4, Likert misinformation = 1).
Discussion: In conclusion, LLMs, particularly ChatGPT-4, can produce high-quality, readable PEMs on IIH. They can also serve as supplementary tools to improve the readability of prewritten PEMs while maintaining quality and accuracy.
期刊介绍:
Neurology® Genetics is an online open access journal publishing peer-reviewed reports in the field of neurogenetics. The journal publishes original articles in all areas of neurogenetics including rare and common genetic variations, genotype-phenotype correlations, outlier phenotypes as a result of mutations in known disease genes, and genetic variations with a putative link to diseases. Articles include studies reporting on genetic disease risk, pharmacogenomics, and results of gene-based clinical trials (viral, ASO, etc.). Genetically engineered model systems are not a primary focus of Neurology® Genetics, but studies using model systems for treatment trials, including well-powered studies reporting negative results, are welcome.