The use of a large language model to create plain language summaries of evidence reviews in healthcare: A feasibility study

Colleen Ovelman, Shannon Kugley, Gerald Gartlehner, Meera Viswanathan
{"title":"The use of a large language model to create plain language summaries of evidence reviews in healthcare: A feasibility study","authors":"Colleen Ovelman,&nbsp;Shannon Kugley,&nbsp;Gerald Gartlehner,&nbsp;Meera Viswanathan","doi":"10.1002/cesm.12041","DOIUrl":null,"url":null,"abstract":"<div>\n \n \n <section>\n \n <h3> Introduction</h3>\n \n <p>Plain language summaries (PLSs) make complex healthcare evidence accessible to patients and the public. Large language models (LLMs) may assist in generating accurate, readable PLSs. This study explored using the LLM Claude 2 to create PLSs of evidence reviews from the Agency for Healthcare Research and Quality (AHRQ) Effective Health Care Program.</p>\n </section>\n \n <section>\n \n <h3> Methods</h3>\n \n <p>We selected 10 evidence reviews published from 2021 to 2023, representing a range of methods and topics. We iteratively developed a prompt to guide Claude 2 in creating PLSs which included specifications for plain language, reading level, length, organizational structure, active voice, and inclusive language. PLSs were assessed for adherence to prompt specifications, comprehensiveness, accuracy, readability, and cultural sensitivity.</p>\n </section>\n \n <section>\n \n <h3> Results</h3>\n \n <p>All PLSs met the word count. We judged one PLS as fully comprehensive; seven mostly comprehensive. We judged two PLSs as fully capturing the PICO elements; five with minor PICO errors. We judged three PLSs as accurately reporting the results; and four with minor result errors. We judged three PLSs as having major result errors for incorrectly reporting total participants. Five PLSs met the target 6th to 8th grade reading level. Passive voice use averaged 16%. All PLSs used inclusive language.</p>\n </section>\n \n <section>\n \n <h3> Conclusions</h3>\n \n <p>LLMs show promise for assisting in PLS creation but likely require human input to ensure accuracy, comprehensiveness, and the appropriate nuances of interpretation. Iterative prompt refinement may improve results and address the needs of specific reviews and audiences. As text-only summaries, the AI-generated PLSs could not meet all consumer communication criteria, such as textual design and visual representations. Further testing should include consumer reviewers and explore how to best leverage LLM support in drafting PLS text for complex evidence reviews.</p>\n </section>\n </div>","PeriodicalId":100286,"journal":{"name":"Cochrane Evidence Synthesis and Methods","volume":"2 2","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cesm.12041","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Cochrane Evidence Synthesis and Methods","FirstCategoryId":"1085","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/cesm.12041","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Introduction

Plain language summaries (PLSs) make complex healthcare evidence accessible to patients and the public. Large language models (LLMs) may assist in generating accurate, readable PLSs. This study explored using the LLM Claude 2 to create PLSs of evidence reviews from the Agency for Healthcare Research and Quality (AHRQ) Effective Health Care Program.

Methods

We selected 10 evidence reviews published from 2021 to 2023, representing a range of methods and topics. We iteratively developed a prompt to guide Claude 2 in creating PLSs which included specifications for plain language, reading level, length, organizational structure, active voice, and inclusive language. PLSs were assessed for adherence to prompt specifications, comprehensiveness, accuracy, readability, and cultural sensitivity.

Results

All PLSs met the word count. We judged one PLS as fully comprehensive; seven mostly comprehensive. We judged two PLSs as fully capturing the PICO elements; five with minor PICO errors. We judged three PLSs as accurately reporting the results; and four with minor result errors. We judged three PLSs as having major result errors for incorrectly reporting total participants. Five PLSs met the target 6th to 8th grade reading level. Passive voice use averaged 16%. All PLSs used inclusive language.

Conclusions

LLMs show promise for assisting in PLS creation but likely require human input to ensure accuracy, comprehensiveness, and the appropriate nuances of interpretation. Iterative prompt refinement may improve results and address the needs of specific reviews and audiences. As text-only summaries, the AI-generated PLSs could not meet all consumer communication criteria, such as textual design and visual representations. Further testing should include consumer reviewers and explore how to best leverage LLM support in drafting PLS text for complex evidence reviews.

使用大型语言模型为医疗保健领域的证据综述创建通俗易懂的摘要:可行性研究
导言 普通语言摘要(PLS)使患者和公众能够获得复杂的医疗证据。大语言模型(LLM)可帮助生成准确、可读的 PLS。本研究探讨了如何使用 LLM Claude 2 创建来自美国医疗保健研究与质量局(AHRQ)有效医疗保健项目的证据综述的 PLS。 方法 我们选择了 10 篇发表于 2021 年至 2023 年的证据综述,它们代表了一系列方法和主题。我们反复编写了一份提示,用于指导 Claude 2 创建 PLS,其中包括对平实语言、阅读水平、篇幅、组织结构、主动语态和包容性语言的规范。我们对 PLS 进行了评估,以确定其是否符合提示规范、全面性、准确性、可读性和文化敏感性。 结果 所有 PLS 均符合字数要求。我们判定一份 PLS 完全全面;七份基本全面。我们判定两份 PLS 完全符合 PICO 要素;五份存在轻微的 PICO 错误。我们判定 3 份 PLS 准确地报告了结果;4 份有轻微的结果错误。我们判定 3 份 PLS 存在重大结果错误,因为它们错误地报告了参与者总数。五份 PLS 达到了六至八年级的目标阅读水平。被动语态使用率平均为 16%。所有 PLS 都使用了包容性语言。 结论 LLMs 有助于 PLS 的创建,但可能需要人工输入以确保准确性、全面性和适当的细微解释。迭代提示改进可能会改善结果,并满足特定评论和受众的需求。作为纯文本摘要,人工智能生成的 PLS 无法满足所有消费者沟通标准,如文本设计和视觉表现。进一步的测试应包括消费者审稿人,并探索如何在起草复杂证据综述的 PLS 文本时最好地利用 LLM 支持。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信