The Use of Large Language Models in Generating Patient Education Materials: a Scoping Review.

Q2 Medicine
Alhasan AlSammarraie, Mowafa Househ
{"title":"The Use of Large Language Models in Generating Patient Education Materials: a Scoping Review.","authors":"Alhasan AlSammarraie, Mowafa Househ","doi":"10.5455/aim.2024.33.4-10","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Patient Education is a healthcare concept that involves educating the public with evidence-based medical information. This information surges their capabilities to promote a healthier life and better manage their conditions. LLM platforms have recently been introduced as powerful NLPs capable of producing human-sounding text and by extension patient education materials.</p><p><strong>Objective: </strong>This study aims to conduct a scoping review to systematically map the existing literature on the use of LLMs for generating patient education materials.</p><p><strong>Methods: </strong>The study followed JBI guidelines, searching five databases using set inclusion/exclusion criteria. A RAG-inspired framework was employed to extract the variables followed by a manual check to verify accuracy of extractions. In total, 21 variables were identified and grouped into five themes: Study Demographics, LLM Characteristics, Prompt-Related Variables, PEM Assessment, and Comparative Outcomes.</p><p><strong>Results: </strong>Results were reported from 69 studies. The United States contributed the largest number of studies. LLM models such as ChatGPT-4, ChatGPT-3.5, and Bard were the most investigated. Most studies evaluated the accuracy of LLM responses and the readability of LLM responses. Only 3 studies implemented external knowledge bases leveraging a RAG architecture. All studies except 3 conducted prompting in English. ChatGPT-4 was found to provide the most accurate responses in comparison with other models.</p><p><strong>Conclusion: </strong>This review examined studies comparing large language models for generating patient education materials. ChatGPT-3.5 and ChatGPT-4 were the most evaluated. Accuracy and readability of responses were the main metrics of evaluation, while few studies used assessment frameworks, retrieval-augmented methods, or explored non-English cases.</p>","PeriodicalId":7074,"journal":{"name":"Acta Informatica Medica","volume":"33 1","pages":"4-10"},"PeriodicalIF":0.0000,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11986337/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Acta Informatica Medica","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5455/aim.2024.33.4-10","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"Medicine","Score":null,"Total":0}
引用次数: 0

Abstract

Background: Patient Education is a healthcare concept that involves educating the public with evidence-based medical information. This information surges their capabilities to promote a healthier life and better manage their conditions. LLM platforms have recently been introduced as powerful NLPs capable of producing human-sounding text and by extension patient education materials.

Objective: This study aims to conduct a scoping review to systematically map the existing literature on the use of LLMs for generating patient education materials.

Methods: The study followed JBI guidelines, searching five databases using set inclusion/exclusion criteria. A RAG-inspired framework was employed to extract the variables followed by a manual check to verify accuracy of extractions. In total, 21 variables were identified and grouped into five themes: Study Demographics, LLM Characteristics, Prompt-Related Variables, PEM Assessment, and Comparative Outcomes.

Results: Results were reported from 69 studies. The United States contributed the largest number of studies. LLM models such as ChatGPT-4, ChatGPT-3.5, and Bard were the most investigated. Most studies evaluated the accuracy of LLM responses and the readability of LLM responses. Only 3 studies implemented external knowledge bases leveraging a RAG architecture. All studies except 3 conducted prompting in English. ChatGPT-4 was found to provide the most accurate responses in comparison with other models.

Conclusion: This review examined studies comparing large language models for generating patient education materials. ChatGPT-3.5 and ChatGPT-4 were the most evaluated. Accuracy and readability of responses were the main metrics of evaluation, while few studies used assessment frameworks, retrieval-augmented methods, or explored non-English cases.

Abstract Image

Abstract Image

Abstract Image

使用大型语言模型生成患者教育材料:范围审查。
背景:患者教育是一个医疗保健概念,涉及以循证医学信息教育公众。这些信息增强了他们促进更健康生活和更好地管理自身状况的能力。法学硕士平台最近被引入为强大的nlp,能够产生听起来像人类的文本,并通过扩展患者教育材料。目的:本研究旨在进行范围审查,以系统地绘制有关使用法学硕士生成患者教育材料的现有文献。方法:本研究遵循JBI指南,按照设定的纳入/排除标准检索5个数据库。采用rag启发的框架提取变量,然后进行手动检查以验证提取的准确性。总共确定了21个变量,并将其分为五个主题:研究人口统计学、法学硕士特征、提示相关变量、PEM评估和比较结果。结果:报告了69项研究的结果。美国贡献了最多的研究。研究最多的是ChatGPT-4、ChatGPT-3.5和Bard等LLM模型。大多数研究评估了法学硕士回答的准确性和法学硕士回答的可读性。只有3个研究利用RAG架构实现了外部知识库。除3项研究外,其余研究均采用英文提示。与其他模型相比,ChatGPT-4提供了最准确的响应。结论:本综述考察了比较大型语言模型用于生成患者教育材料的研究。ChatGPT-3.5和ChatGPT-4评价最高。回答的准确性和可读性是评估的主要指标,而很少有研究使用评估框架、检索增强方法或探索非英语案例。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Acta Informatica Medica
Acta Informatica Medica Medicine-Medicine (all)
CiteScore
2.90
自引率
0.00%
发文量
37
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信