Evaluating evidence-based health information from generative AI using a cross-sectional study with laypeople seeking screening information

IF 12.4 1区医学 Q1 HEALTH CARE SCIENCES & SERVICES

NPJ Digital Medicine Pub Date : 2025-06-09 DOI:10.1038/s41746-025-01752-6

Felix G. Rebitschek, Alessandra Carella, Silja Kohlrausch-Pazin, Michael Zitzmann, Anke Steckelberg, Christoph Wilhelm

{"title":"Evaluating evidence-based health information from generative AI using a cross-sectional study with laypeople seeking screening information","authors":"Felix G. Rebitschek, Alessandra Carella, Silja Kohlrausch-Pazin, Michael Zitzmann, Anke Steckelberg, Christoph Wilhelm","doi":"10.1038/s41746-025-01752-6","DOIUrl":null,"url":null,"abstract":"<p>Large language models (LLMs) are used to seek health information. Guidelines for evidence-based health communication require the presentation of the best available evidence to support informed decision-making. We investigate the prompt-dependent guideline compliance of LLMs and evaluate a minimal behavioural intervention for boosting laypeople’s prompting. Study 1 systematically varied prompt informedness, topic, and LLMs to evaluate compliance. Study 2 randomized 300 participants to three LLMs under standard or boosted prompting conditions. Blinded raters assessed LLM response with two instruments. Study 1 found that LLMs failed evidence-based health communication standards. The quality of responses was found to be contingent upon prompt informedness. Study 2 revealed that laypeople frequently generated poor-quality responses. The simple boost improved response quality, though it remained below required standards. These findings underscore the inadequacy of LLMs as a standalone health communication tool. Integrating LLMs with evidence-based frameworks, enhancing their reasoning and interfaces, and teaching prompting are essential. Study Registration: German Clinical Trials Register (DRKS) (Reg. No.: DRKS00035228, registered on 15 October 2024).</p>","PeriodicalId":19349,"journal":{"name":"NPJ Digital Medicine","volume":"3 1","pages":""},"PeriodicalIF":12.4000,"publicationDate":"2025-06-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"NPJ Digital Medicine","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1038/s41746-025-01752-6","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}

引用次数: 0

Abstract

Large language models (LLMs) are used to seek health information. Guidelines for evidence-based health communication require the presentation of the best available evidence to support informed decision-making. We investigate the prompt-dependent guideline compliance of LLMs and evaluate a minimal behavioural intervention for boosting laypeople’s prompting. Study 1 systematically varied prompt informedness, topic, and LLMs to evaluate compliance. Study 2 randomized 300 participants to three LLMs under standard or boosted prompting conditions. Blinded raters assessed LLM response with two instruments. Study 1 found that LLMs failed evidence-based health communication standards. The quality of responses was found to be contingent upon prompt informedness. Study 2 revealed that laypeople frequently generated poor-quality responses. The simple boost improved response quality, though it remained below required standards. These findings underscore the inadequacy of LLMs as a standalone health communication tool. Integrating LLMs with evidence-based frameworks, enhancing their reasoning and interfaces, and teaching prompting are essential. Study Registration: German Clinical Trials Register (DRKS) (Reg. No.: DRKS00035228, registered on 15 October 2024).

Abstract Image

查看原文本刊更多论文

通过横断面研究评估生成式人工智能的循证健康信息，外行人寻求筛查信息

大型语言模型（llm）用于查找健康信息。基于证据的卫生传播指南要求提供现有的最佳证据，以支持知情决策。我们调查了llm的提示依赖指南依从性，并评估了促进外行提示的最小行为干预。研究1系统地改变提示信息、主题和法学硕士来评估依从性。研究2将300名参与者随机分为三个llm，分别在标准或增强的提示条件下进行。盲法评分者用两种工具评估LLM的反应。研究1发现法学硕士未能达到循证健康沟通标准。调查发现，答复的质量取决于及时通报情况。研究2显示，外行人经常给出质量不佳的回答。简单的提升提高了响应质量，但仍低于要求的标准。这些发现强调了llm作为独立的健康沟通工具的不足之处。整合法学硕士与基于证据的框架，加强他们的推理和界面，教学提示是必不可少的。研究注册：德国临床试验注册（DRKS） (Reg。否。编号：DRKS00035228, 2024年10月15日注册)。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

NPJ Digital Medicine Multiple-

CiteScore

25.10

自引率

3.30%

发文量

170

审稿时长

15 weeks

期刊介绍： npj Digital Medicine is an online open-access journal that focuses on publishing peer-reviewed research in the field of digital medicine. The journal covers various aspects of digital medicine, including the application and implementation of digital and mobile technologies in clinical settings, virtual healthcare, and the use of artificial intelligence and informatics. The primary goal of the journal is to support innovation and the advancement of healthcare through the integration of new digital and mobile technologies. When determining if a manuscript is suitable for publication, the journal considers four important criteria: novelty, clinical relevance, scientific rigor, and digital innovation.