BadCodePrompt: backdoor attacks against prompt engineering of large language models for code generation

IF 3.1 2区计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Automated Software Engineering Pub Date : 2025-01-28 DOI:10.1007/s10515-024-00485-2

Yubin Qu, Song Huang, Yanzhou Li, Tongtong Bai, Xiang Chen, Xingya Wang, Long Li, Yongming Yao

{"title":"BadCodePrompt: backdoor attacks against prompt engineering of large language models for code generation","authors":"Yubin Qu, Song Huang, Yanzhou Li, Tongtong Bai, Xiang Chen, Xingya Wang, Long Li, Yongming Yao","doi":"10.1007/s10515-024-00485-2","DOIUrl":null,"url":null,"abstract":"<div>Using few-shot demonstrations in prompts significantly enhances the generation quality of large language models (LLMs), including code generation. However, adversarial examples injected by malicious service providers via few-shot prompting pose a risk of backdoor attacks in large language models. There is no research on backdoor attacks on large language models in the few-shot prompting setting for code generation tasks. In this paper, we propose BadCodePrompt, the first backdoor attack for code generation tasks targeting LLMS in the few-shot prompting scenario, without requiring access to training data or model parameters and with lower computational overhead. BadCodePrompt exploits the insertion of triggers and poisonous code patterns into examples, causing the output of poisonous source code when there is a backdoor trigger in the end user’s query prompt. We demonstrate the effectiveness of BadCodePrompt in conducting backdoor attacks on three LLMS (GPT-4, Claude-3.5-Sonnet, and Gemini Pro-1.5) in code generation tasks without affecting the functionality of the generated code. LLMs with stronger reasoning capabilities are also more vulnerable to BadCodePrompt, with an average attack success rate of up to 98.53% for GPT-4 in two benchmark tasks. Finally, we employ state-of-the-art defenses against backdoor attacks in Prompt Engineering and show their overall ineffectiveness against BadCodePrompt. Therefore, BadCodePrompt remains a serious threat to LLMS, underscoring the urgency of developing effective defense mechanisms.</div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"32 1","pages":""},"PeriodicalIF":3.1000,"publicationDate":"2025-01-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Automated Software Engineering","FirstCategoryId":"94","ListUrlMain":"https://link.springer.com/article/10.1007/s10515-024-00485-2","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}

引用次数: 0

Abstract

Using few-shot demonstrations in prompts significantly enhances the generation quality of large language models (LLMs), including code generation. However, adversarial examples injected by malicious service providers via few-shot prompting pose a risk of backdoor attacks in large language models. There is no research on backdoor attacks on large language models in the few-shot prompting setting for code generation tasks. In this paper, we propose BadCodePrompt, the first backdoor attack for code generation tasks targeting LLMS in the few-shot prompting scenario, without requiring access to training data or model parameters and with lower computational overhead. BadCodePrompt exploits the insertion of triggers and poisonous code patterns into examples, causing the output of poisonous source code when there is a backdoor trigger in the end user’s query prompt. We demonstrate the effectiveness of BadCodePrompt in conducting backdoor attacks on three LLMS (GPT-4, Claude-3.5-Sonnet, and Gemini Pro-1.5) in code generation tasks without affecting the functionality of the generated code. LLMs with stronger reasoning capabilities are also more vulnerable to BadCodePrompt, with an average attack success rate of up to 98.53% for GPT-4 in two benchmark tasks. Finally, we employ state-of-the-art defenses against backdoor attacks in Prompt Engineering and show their overall ineffectiveness against BadCodePrompt. Therefore, BadCodePrompt remains a serious threat to LLMS, underscoring the urgency of developing effective defense mechanisms.

Abstract Image

查看原文本刊更多论文

BadCodePrompt：针对用于代码生成的大型语言模型的提示工程的后门攻击

在提示中使用少量演示可以显著提高大型语言模型（llm）的生成质量，包括代码生成。然而，恶意服务提供者通过少量提示注入的对抗性示例在大型语言模型中存在后门攻击的风险。在代码生成任务的几次提示设置下，对大型语言模型的后门攻击还没有研究。在本文中，我们提出了BadCodePrompt，这是针对LLMS的代码生成任务的第一个后门攻击，它不需要访问训练数据或模型参数，并且具有较低的计算开销。BadCodePrompt利用在示例中插入触发器和有毒代码模式，当最终用户的查询提示中存在后门触发器时，会导致输出有毒源代码。我们演示了BadCodePrompt在代码生成任务中对三个LLMS （GPT-4、Claude-3.5-Sonnet和Gemini Pro-1.5）进行后门攻击的有效性，而不会影响生成代码的功能。推理能力较强的llm也更容易受到BadCodePrompt的攻击，GPT-4在两个基准任务中的平均攻击成功率高达98.53%。最后，我们在Prompt Engineering中采用了最先进的后门攻击防御，并展示了它们在BadCodePrompt中的总体无效。因此，BadCodePrompt仍然是对LLMS的严重威胁，强调了开发有效防御机制的紧迫性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Automated Software Engineering 工程技术-计算机：软件工程

CiteScore

4.80

自引率

11.80%

发文量

审稿时长

>12 weeks

期刊介绍： This journal details research, tutorial papers, survey and accounts of significant industrial experience in the foundations, techniques, tools and applications of automated software engineering technology. This includes the study of techniques for constructing, understanding, adapting, and modeling software artifacts and processes. Coverage in Automated Software Engineering examines both automatic systems and collaborative systems as well as computational models of human software engineering activities. In addition, it presents knowledge representations and artificial intelligence techniques applicable to automated software engineering, and formal techniques that support or provide theoretical foundations. The journal also includes reviews of books, software, conferences and workshops.