Multi-model assurance analysis showing large language models are highly vulnerable to adversarial hallucination attacks during clinical decision support.

IF 5.4 Q1 MEDICINE, RESEARCH & EXPERIMENTAL
Mahmud Omar, Vera Sorin, Jeremy D Collins, David Reich, Robert Freeman, Nicholas Gavin, Alexander Charney, Lisa Stump, Nicola Luigi Bragazzi, Girish N Nadkarni, Eyal Klang
{"title":"Multi-model assurance analysis showing large language models are highly vulnerable to adversarial hallucination attacks during clinical decision support.","authors":"Mahmud Omar, Vera Sorin, Jeremy D Collins, David Reich, Robert Freeman, Nicholas Gavin, Alexander Charney, Lisa Stump, Nicola Luigi Bragazzi, Girish N Nadkarni, Eyal Klang","doi":"10.1038/s43856-025-01021-3","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Large language models (LLMs) show promise in clinical contexts but can generate false facts (often referred to as \"hallucinations\"). One subset of these errors arises from adversarial attacks, in which fabricated details embedded in prompts lead the model to produce or elaborate on the false information. We embedded fabricated content in clinical prompts to elicit adversarial hallucination attacks in multiple large language models. We quantified how often they elaborated on false details and tested whether a specialized mitigation prompt or altered temperature settings reduced errors.</p><p><strong>Methods: </strong>We created 300 physician-validated simulated vignettes, each containing one fabricated detail (a laboratory test, a physical or radiological sign, or a medical condition). Each vignette was presented in short and long versions-differing only in word count but identical in medical content. We tested six LLMs under three conditions: default (standard settings), mitigating prompt (designed to reduce hallucinations), and temperature 0 (deterministic output with maximum response certainty), generating 5,400 outputs. If a model elaborated on the fabricated detail, the case was classified as a \"hallucination\".</p><p><strong>Results: </strong>Hallucination rates range from 50 % to 82 % across models and prompting methods. Prompt-based mitigation lowers the overall hallucination rate (mean across all models) from 66 % to 44 % (p < 0.001). For the best-performing model, GPT-4o, rates decline from 53 % to 23 % (p < 0.001). Temperature adjustments offer no significant improvement. Short vignettes show slightly higher odds of hallucination.</p><p><strong>Conclusions: </strong>LLMs are highly susceptible to adversarial hallucination attacks, frequently generating false clinical details that pose risks when used without safeguards. While prompt engineering reduces errors, it does not eliminate them.</p>","PeriodicalId":72646,"journal":{"name":"Communications medicine","volume":"5 1","pages":"330"},"PeriodicalIF":5.4000,"publicationDate":"2025-08-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12318031/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Communications medicine","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1038/s43856-025-01021-3","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MEDICINE, RESEARCH & EXPERIMENTAL","Score":null,"Total":0}
引用次数: 0

Abstract

Background: Large language models (LLMs) show promise in clinical contexts but can generate false facts (often referred to as "hallucinations"). One subset of these errors arises from adversarial attacks, in which fabricated details embedded in prompts lead the model to produce or elaborate on the false information. We embedded fabricated content in clinical prompts to elicit adversarial hallucination attacks in multiple large language models. We quantified how often they elaborated on false details and tested whether a specialized mitigation prompt or altered temperature settings reduced errors.

Methods: We created 300 physician-validated simulated vignettes, each containing one fabricated detail (a laboratory test, a physical or radiological sign, or a medical condition). Each vignette was presented in short and long versions-differing only in word count but identical in medical content. We tested six LLMs under three conditions: default (standard settings), mitigating prompt (designed to reduce hallucinations), and temperature 0 (deterministic output with maximum response certainty), generating 5,400 outputs. If a model elaborated on the fabricated detail, the case was classified as a "hallucination".

Results: Hallucination rates range from 50 % to 82 % across models and prompting methods. Prompt-based mitigation lowers the overall hallucination rate (mean across all models) from 66 % to 44 % (p < 0.001). For the best-performing model, GPT-4o, rates decline from 53 % to 23 % (p < 0.001). Temperature adjustments offer no significant improvement. Short vignettes show slightly higher odds of hallucination.

Conclusions: LLMs are highly susceptible to adversarial hallucination attacks, frequently generating false clinical details that pose risks when used without safeguards. While prompt engineering reduces errors, it does not eliminate them.

Abstract Image

Abstract Image

Abstract Image

多模型保证分析表明,在临床决策支持过程中,大型语言模型极易受到对抗性幻觉攻击。
背景:大型语言模型(llm)在临床环境中表现出希望,但可能产生错误的事实(通常被称为“幻觉”)。这些错误的一个子集来自对抗性攻击,其中嵌入在提示中的虚构细节导致模型产生或详细说明虚假信息。我们在临床提示中嵌入虚构的内容,以在多个大型语言模型中引发对抗性幻觉攻击。我们量化了他们详细阐述错误细节的频率,并测试了专门的缓解提示或改变的温度设置是否减少了错误。方法:我们创建了300个经过医生验证的模拟小片段,每个小片段包含一个虚构的细节(实验室测试、物理或放射学征象或医疗状况)。每个小插图都有短版本和长版本——只有字数不同,但医学内容相同。我们在三种条件下测试了六个llm:默认(标准设置)、缓解提示(旨在减少幻觉)和温度0(具有最大响应确定性的确定性输出),生成了5,400个输出。如果一个模型详细阐述了虚构的细节,那么这个案例就被归类为“幻觉”。结果:在不同的模型和提示方法中,幻觉率从50%到82%不等。基于即时的缓解将总体幻觉率(所有模型的平均值)从66%降低到44% (p)。结论:llm极易受到对抗性幻觉攻击,在没有保护措施的情况下使用时经常产生错误的临床细节,从而构成风险。虽然快速工程减少了错误,但并不能完全消除错误。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信