Assessing the System-Instruction Vulnerabilities of Large Language Models to Malicious Conversion Into Health Disinformation Chatbots.

IF 19.6 1区 医学 Q1 MEDICINE, GENERAL & INTERNAL
Natansh D Modi, Bradley D Menz, Abdulhalim A Awaty, Cyril A Alex, Jessica M Logan, Ross A McKinnon, Andrew Rowland, Stephen Bacchi, Kacper Gradon, Michael J Sorich, Ashley M Hopkins
{"title":"Assessing the System-Instruction Vulnerabilities of Large Language Models to Malicious Conversion Into Health Disinformation Chatbots.","authors":"Natansh D Modi, Bradley D Menz, Abdulhalim A Awaty, Cyril A Alex, Jessica M Logan, Ross A McKinnon, Andrew Rowland, Stephen Bacchi, Kacper Gradon, Michael J Sorich, Ashley M Hopkins","doi":"10.7326/ANNALS-24-03933","DOIUrl":null,"url":null,"abstract":"<p><p>Large language models (LLMs) offer substantial promise for improving health care; however, some risks warrant evaluation and discussion. This study assessed the effectiveness of safeguards in foundational LLMs against malicious instruction into health disinformation chatbots. Five foundational LLMs-OpenAI's GPT-4o, Google's Gemini 1.5 Pro, Anthropic's Claude 3.5 Sonnet, Meta's Llama 3.2-90B Vision, and xAI's Grok Beta-were evaluated via their application programming interfaces (APIs). Each API received system-level instructions to produce incorrect responses to health queries, delivered in a formal, authoritative, convincing, and scientific tone. Ten health questions were posed to each customized chatbot in duplicate. Exploratory analyses assessed the feasibility of creating a customized generative pretrained transformer (GPT) within the OpenAI GPT Store and searched to identify if any publicly accessible GPTs in the store seemed to respond with disinformation. Of the 100 health queries posed across the 5 customized LLM API chatbots, 88 (88%) responses were health disinformation. Four of the 5 chatbots (GPT-4o, Gemini 1.5 Pro, Llama 3.2-90B Vision, and Grok Beta) generated disinformation in 100% (20 of 20) of their responses, whereas Claude 3.5 Sonnet responded with disinformation in 40% (8 of 20). The disinformation included claimed vaccine-autism links, HIV being airborne, cancer-curing diets, sunscreen risks, genetically modified organism conspiracies, attention deficit-hyperactivity disorder and depression myths, garlic replacing antibiotics, and 5G causing infertility. Exploratory analyses further showed that the OpenAI GPT Store could currently be instructed to generate similar disinformation. Overall, LLM APIs and the OpenAI GPT Store were shown to be vulnerable to malicious system-level instructions to covertly create health disinformation chatbots. These findings highlight the urgent need for robust output screening safeguards to ensure public health safety in an era of rapidly evolving technologies.</p>","PeriodicalId":7932,"journal":{"name":"Annals of Internal Medicine","volume":" ","pages":""},"PeriodicalIF":19.6000,"publicationDate":"2025-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Annals of Internal Medicine","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.7326/ANNALS-24-03933","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MEDICINE, GENERAL & INTERNAL","Score":null,"Total":0}
引用次数: 0

Abstract

Large language models (LLMs) offer substantial promise for improving health care; however, some risks warrant evaluation and discussion. This study assessed the effectiveness of safeguards in foundational LLMs against malicious instruction into health disinformation chatbots. Five foundational LLMs-OpenAI's GPT-4o, Google's Gemini 1.5 Pro, Anthropic's Claude 3.5 Sonnet, Meta's Llama 3.2-90B Vision, and xAI's Grok Beta-were evaluated via their application programming interfaces (APIs). Each API received system-level instructions to produce incorrect responses to health queries, delivered in a formal, authoritative, convincing, and scientific tone. Ten health questions were posed to each customized chatbot in duplicate. Exploratory analyses assessed the feasibility of creating a customized generative pretrained transformer (GPT) within the OpenAI GPT Store and searched to identify if any publicly accessible GPTs in the store seemed to respond with disinformation. Of the 100 health queries posed across the 5 customized LLM API chatbots, 88 (88%) responses were health disinformation. Four of the 5 chatbots (GPT-4o, Gemini 1.5 Pro, Llama 3.2-90B Vision, and Grok Beta) generated disinformation in 100% (20 of 20) of their responses, whereas Claude 3.5 Sonnet responded with disinformation in 40% (8 of 20). The disinformation included claimed vaccine-autism links, HIV being airborne, cancer-curing diets, sunscreen risks, genetically modified organism conspiracies, attention deficit-hyperactivity disorder and depression myths, garlic replacing antibiotics, and 5G causing infertility. Exploratory analyses further showed that the OpenAI GPT Store could currently be instructed to generate similar disinformation. Overall, LLM APIs and the OpenAI GPT Store were shown to be vulnerable to malicious system-level instructions to covertly create health disinformation chatbots. These findings highlight the urgent need for robust output screening safeguards to ensure public health safety in an era of rapidly evolving technologies.

评估大型语言模型的系统指令漏洞,以恶意转换为健康虚假信息聊天机器人。
大型语言模型(llm)为改善医疗保健提供了巨大的希望;然而,有些风险值得评估和讨论。本研究评估了基础法学硕士防范恶意指令进入健康虚假信息聊天机器人的有效性。五个基础LLMs-OpenAI的gpt - 40, b谷歌的Gemini 1.5 Pro, Anthropic的Claude 3.5 Sonnet, Meta的Llama 3.2-90B Vision和xAI的Grok beta通过它们的应用程序编程接口(api)进行评估。每个API都接收系统级指令,以正式、权威、令人信服和科学的语气对健康查询产生错误的响应。向每个定制聊天机器人提出10个健康问题,一式两份。探索性分析评估了在OpenAI GPT商店中创建定制生成预训练转换器(GPT)的可行性,并搜索以确定商店中是否有任何可公开访问的GPT似乎对虚假信息作出反应。在5个定制LLM API聊天机器人提出的100个健康问题中,88个(88%)回答是健康虚假信息。5个聊天机器人中的4个(gpt - 40, Gemini 1.5 Pro, Llama 3.2-90B Vision和Grok Beta)在他们的回答中产生了100%的虚假信息(20个中的20个),而Claude 3.5 Sonnet的回答中产生了40%的虚假信息(20个中的8个)。这些虚假信息包括声称疫苗与自闭症有关、艾滋病毒在空气中传播、治疗癌症的饮食、防晒霜的风险、转基因生物的阴谋论、注意力缺陷多动障碍和抑郁症的神话、大蒜取代抗生素、5G导致不孕等。探索性分析进一步表明,OpenAI GPT Store目前可以被指示生成类似的虚假信息。总体而言,LLM api和OpenAI GPT Store被证明容易受到恶意系统级指令的攻击,这些指令可以暗中创建健康虚假信息聊天机器人。这些发现突出表明,在技术快速发展的时代,迫切需要强有力的产出筛选保障措施,以确保公共卫生安全。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Annals of Internal Medicine
Annals of Internal Medicine 医学-医学:内科
CiteScore
23.90
自引率
1.80%
发文量
1136
审稿时长
3-8 weeks
期刊介绍: Established in 1927 by the American College of Physicians (ACP), Annals of Internal Medicine is the premier internal medicine journal. Annals of Internal Medicine’s mission is to promote excellence in medicine, enable physicians and other health care professionals to be well informed members of the medical community and society, advance standards in the conduct and reporting of medical research, and contribute to improving the health of people worldwide. To achieve this mission, the journal publishes a wide variety of original research, review articles, practice guidelines, and commentary relevant to clinical practice, health care delivery, public health, health care policy, medical education, ethics, and research methodology. In addition, the journal publishes personal narratives that convey the feeling and the art of medicine.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信