与上下文后门攻击妥协LLM驱动的具身代理

IF 6.3 1区计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS

IEEE Transactions on Information Forensics and Security Pub Date : 2025-03-27 DOI:10.1109/TIFS.2025.3555410

Aishan Liu;Yuguang Zhou;Xianglong Liu;Tianyuan Zhang;Siyuan Liang;Jiakai Wang;Yanjun Pu;Tianlin Li;Junqi Zhang;Wenbo Zhou;Qing Guo;Dacheng Tao

{"title":"与上下文后门攻击妥协LLM驱动的具身代理","authors":"Aishan Liu;Yuguang Zhou;Xianglong Liu;Tianyuan Zhang;Siyuan Liang;Jiakai Wang;Yanjun Pu;Tianlin Li;Junqi Zhang;Wenbo Zhou;Qing Guo;Dacheng Tao","doi":"10.1109/TIFS.2025.3555410","DOIUrl":null,"url":null,"abstract":"Large language models (LLMs) have transformed the development of embodied intelligence. By providing a few contextual demonstrations (such as rationales and solution examples) developers can utilize the extensive internal knowledge of LLMs to effortlessly translate complex tasks described in abstract language into sequences of code snippets, which will serve as the execution logic for embodied agents. However, this paper uncovers a significant backdoor security threat within this process and introduces a novel method called Contextual Backdoor Attack. By poisoning just a few contextual demonstrations, attackers can covertly compromise the contextual environment of a closed-box LLM, prompting it to generate programs with context-dependent defects. These programs appear logically sound but contain defects that can activate and induce unintended behaviors when the operational agent encounters specific triggers in its interactive environment. To compromise the LLM’s contextual environment, we employ adversarial in-context generation to optimize poisoned demonstrations, where an LLM judge evaluates these poisoned prompts, reporting to an additional LLM that iteratively optimizes the demonstration in a two-player adversarial game using chain-of-thought reasoning. To enable context-dependent behaviors in downstream agents, we implement a dual-modality activation strategy that controls both the generation and execution of program defects through textual and visual triggers. We expand the scope of our attack by developing five program defect modes that compromise key aspects of confidentiality, integrity, and availability in embodied agents. To validate the effectiveness of our approach, we conducted extensive experiments across various tasks, including robot planning, robot manipulation, and compositional visual reasoning. Additionally, we demonstrate the potential impact of our approach by successfully attacking real-world autonomous driving systems. The contextual backdoor threat introduced in this study poses serious risks for millions of downstream embodied agents, given that most publicly available LLMs are third-party-provided. This paper aims to raise awareness of this critical threat. Our code and demos are available at <uri>https://contextual-backdoor.github.io/</uri>.","PeriodicalId":13492,"journal":{"name":"IEEE Transactions on Information Forensics and Security","volume":"20 ","pages":"3979-3994"},"PeriodicalIF":6.3000,"publicationDate":"2025-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Compromising LLM Driven Embodied Agents With Contextual Backdoor Attacks\",\"authors\":\"Aishan Liu;Yuguang Zhou;Xianglong Liu;Tianyuan Zhang;Siyuan Liang;Jiakai Wang;Yanjun Pu;Tianlin Li;Junqi Zhang;Wenbo Zhou;Qing Guo;Dacheng Tao\",\"doi\":\"10.1109/TIFS.2025.3555410\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Large language models (LLMs) have transformed the development of embodied intelligence. By providing a few contextual demonstrations (such as rationales and solution examples) developers can utilize the extensive internal knowledge of LLMs to effortlessly translate complex tasks described in abstract language into sequences of code snippets, which will serve as the execution logic for embodied agents. However, this paper uncovers a significant backdoor security threat within this process and introduces a novel method called Contextual Backdoor Attack. By poisoning just a few contextual demonstrations, attackers can covertly compromise the contextual environment of a closed-box LLM, prompting it to generate programs with context-dependent defects. These programs appear logically sound but contain defects that can activate and induce unintended behaviors when the operational agent encounters specific triggers in its interactive environment. To compromise the LLM’s contextual environment, we employ adversarial in-context generation to optimize poisoned demonstrations, where an LLM judge evaluates these poisoned prompts, reporting to an additional LLM that iteratively optimizes the demonstration in a two-player adversarial game using chain-of-thought reasoning. To enable context-dependent behaviors in downstream agents, we implement a dual-modality activation strategy that controls both the generation and execution of program defects through textual and visual triggers. We expand the scope of our attack by developing five program defect modes that compromise key aspects of confidentiality, integrity, and availability in embodied agents. To validate the effectiveness of our approach, we conducted extensive experiments across various tasks, including robot planning, robot manipulation, and compositional visual reasoning. Additionally, we demonstrate the potential impact of our approach by successfully attacking real-world autonomous driving systems. The contextual backdoor threat introduced in this study poses serious risks for millions of downstream embodied agents, given that most publicly available LLMs are third-party-provided. This paper aims to raise awareness of this critical threat. Our code and demos are available at <uri>https://contextual-backdoor.github.io/</uri>.\",\"PeriodicalId\":13492,\"journal\":{\"name\":\"IEEE Transactions on Information Forensics and Security\",\"volume\":\"20 \",\"pages\":\"3979-3994\"},\"PeriodicalIF\":6.3000,\"publicationDate\":\"2025-03-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Information Forensics and Security\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10943262/\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, THEORY & METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Information Forensics and Security","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10943262/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}

引用次数: 0

摘要

大型语言模型（llm）已经改变了具身智能的发展。通过提供一些上下文演示（例如基本原理和解决方案示例），开发人员可以利用llm广泛的内部知识，毫不费力地将用抽象语言描述的复杂任务转换为代码片段序列，这些代码片段将作为嵌入代理的执行逻辑。然而，本文揭示了在此过程中存在的重大后门安全威胁，并引入了一种新的方法——上下文后门攻击。通过毒害几个上下文演示，攻击者可以暗中破坏封闭盒LLM的上下文环境，促使它生成具有上下文相关缺陷的程序。这些程序在逻辑上似乎是合理的，但包含一些缺陷，当操作代理在其交互环境中遇到特定的触发器时，可能会激活和诱导意外行为。为了妥协法学硕士的上下文环境，我们采用对抗性上下文生成来优化有毒演示，其中法学硕士法官评估这些有毒提示，向另外一个法学硕士报告，该法学硕士使用思维链推理迭代地优化两人对抗性博弈中的演示。为了在下游代理中启用上下文相关的行为，我们实现了一种双模态激活策略，该策略通过文本和视觉触发器控制程序缺陷的生成和执行。我们通过开发五种程序缺陷模式来扩展我们的攻击范围，这些模式损害了嵌入代理的机密性、完整性和可用性的关键方面。为了验证我们方法的有效性，我们在各种任务中进行了广泛的实验，包括机器人规划、机器人操作和组合视觉推理。此外，我们通过成功攻击现实世界的自动驾驶系统，展示了我们的方法的潜在影响。鉴于大多数公开可用的法学硕士都是第三方提供的，本研究中引入的上下文后门威胁给数百万下游嵌入代理带来了严重的风险。本文旨在提高人们对这一严重威胁的认识。我们的代码和演示可以在https://contextual-backdoor.github.io/上获得。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Compromising LLM Driven Embodied Agents With Contextual Backdoor Attacks

Large language models (LLMs) have transformed the development of embodied intelligence. By providing a few contextual demonstrations (such as rationales and solution examples) developers can utilize the extensive internal knowledge of LLMs to effortlessly translate complex tasks described in abstract language into sequences of code snippets, which will serve as the execution logic for embodied agents. However, this paper uncovers a significant backdoor security threat within this process and introduces a novel method called Contextual Backdoor Attack. By poisoning just a few contextual demonstrations, attackers can covertly compromise the contextual environment of a closed-box LLM, prompting it to generate programs with context-dependent defects. These programs appear logically sound but contain defects that can activate and induce unintended behaviors when the operational agent encounters specific triggers in its interactive environment. To compromise the LLM’s contextual environment, we employ adversarial in-context generation to optimize poisoned demonstrations, where an LLM judge evaluates these poisoned prompts, reporting to an additional LLM that iteratively optimizes the demonstration in a two-player adversarial game using chain-of-thought reasoning. To enable context-dependent behaviors in downstream agents, we implement a dual-modality activation strategy that controls both the generation and execution of program defects through textual and visual triggers. We expand the scope of our attack by developing five program defect modes that compromise key aspects of confidentiality, integrity, and availability in embodied agents. To validate the effectiveness of our approach, we conducted extensive experiments across various tasks, including robot planning, robot manipulation, and compositional visual reasoning. Additionally, we demonstrate the potential impact of our approach by successfully attacking real-world autonomous driving systems. The contextual backdoor threat introduced in this study poses serious risks for millions of downstream embodied agents, given that most publicly available LLMs are third-party-provided. This paper aims to raise awareness of this critical threat. Our code and demos are available at https://contextual-backdoor.github.io/.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Transactions on Information Forensics and Security 工程技术-工程：电子与电气

CiteScore

14.40

自引率

7.40%

发文量

234

审稿时长

6.5 months

期刊介绍： The IEEE Transactions on Information Forensics and Security covers the sciences, technologies, and applications relating to information forensics, information security, biometrics, surveillance and systems applications that incorporate these features