Evaluating evidence-based health information from generative AI using a cross-sectional study with laypeople seeking screening information

IF 12.4 1区 医学 Q1 HEALTH CARE SCIENCES & SERVICES
Felix G. Rebitschek, Alessandra Carella, Silja Kohlrausch-Pazin, Michael Zitzmann, Anke Steckelberg, Christoph Wilhelm
{"title":"Evaluating evidence-based health information from generative AI using a cross-sectional study with laypeople seeking screening information","authors":"Felix G. Rebitschek, Alessandra Carella, Silja Kohlrausch-Pazin, Michael Zitzmann, Anke Steckelberg, Christoph Wilhelm","doi":"10.1038/s41746-025-01752-6","DOIUrl":null,"url":null,"abstract":"<p>Large language models (LLMs) are used to seek health information. Guidelines for evidence-based health communication require the presentation of the best available evidence to support informed decision-making. We investigate the prompt-dependent guideline compliance of LLMs and evaluate a minimal behavioural intervention for boosting laypeople’s prompting. Study 1 systematically varied prompt informedness, topic, and LLMs to evaluate compliance. Study 2 randomized 300 participants to three LLMs under standard or boosted prompting conditions. Blinded raters assessed LLM response with two instruments. Study 1 found that LLMs failed evidence-based health communication standards. The quality of responses was found to be contingent upon prompt informedness. Study 2 revealed that laypeople frequently generated poor-quality responses. The simple boost improved response quality, though it remained below required standards. These findings underscore the inadequacy of LLMs as a standalone health communication tool. Integrating LLMs with evidence-based frameworks, enhancing their reasoning and interfaces, and teaching prompting are essential. Study Registration: German Clinical Trials Register (DRKS) (Reg. No.: DRKS00035228, registered on 15 October 2024).</p>","PeriodicalId":19349,"journal":{"name":"NPJ Digital Medicine","volume":"3 1","pages":""},"PeriodicalIF":12.4000,"publicationDate":"2025-06-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"NPJ Digital Medicine","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1038/s41746-025-01752-6","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}
引用次数: 0

Abstract

Large language models (LLMs) are used to seek health information. Guidelines for evidence-based health communication require the presentation of the best available evidence to support informed decision-making. We investigate the prompt-dependent guideline compliance of LLMs and evaluate a minimal behavioural intervention for boosting laypeople’s prompting. Study 1 systematically varied prompt informedness, topic, and LLMs to evaluate compliance. Study 2 randomized 300 participants to three LLMs under standard or boosted prompting conditions. Blinded raters assessed LLM response with two instruments. Study 1 found that LLMs failed evidence-based health communication standards. The quality of responses was found to be contingent upon prompt informedness. Study 2 revealed that laypeople frequently generated poor-quality responses. The simple boost improved response quality, though it remained below required standards. These findings underscore the inadequacy of LLMs as a standalone health communication tool. Integrating LLMs with evidence-based frameworks, enhancing their reasoning and interfaces, and teaching prompting are essential. Study Registration: German Clinical Trials Register (DRKS) (Reg. No.: DRKS00035228, registered on 15 October 2024).

Abstract Image

通过横断面研究评估生成式人工智能的循证健康信息,外行人寻求筛查信息
大型语言模型(llm)用于查找健康信息。基于证据的卫生传播指南要求提供现有的最佳证据,以支持知情决策。我们调查了llm的提示依赖指南依从性,并评估了促进外行提示的最小行为干预。研究1系统地改变提示信息、主题和法学硕士来评估依从性。研究2将300名参与者随机分为三个llm,分别在标准或增强的提示条件下进行。盲法评分者用两种工具评估LLM的反应。研究1发现法学硕士未能达到循证健康沟通标准。调查发现,答复的质量取决于及时通报情况。研究2显示,外行人经常给出质量不佳的回答。简单的提升提高了响应质量,但仍低于要求的标准。这些发现强调了llm作为独立的健康沟通工具的不足之处。整合法学硕士与基于证据的框架,加强他们的推理和界面,教学提示是必不可少的。研究注册:德国临床试验注册(DRKS) (Reg。否。编号:DRKS00035228, 2024年10月15日注册)。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
25.10
自引率
3.30%
发文量
170
审稿时长
15 weeks
期刊介绍: npj Digital Medicine is an online open-access journal that focuses on publishing peer-reviewed research in the field of digital medicine. The journal covers various aspects of digital medicine, including the application and implementation of digital and mobile technologies in clinical settings, virtual healthcare, and the use of artificial intelligence and informatics. The primary goal of the journal is to support innovation and the advancement of healthcare through the integration of new digital and mobile technologies. When determining if a manuscript is suitable for publication, the journal considers four important criteria: novelty, clinical relevance, scientific rigor, and digital innovation.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信