{"title":"Prompt Obfuscation for Large Language Models","authors":"David Pape, Thorsten Eisenhofer, Lea Schönherr","doi":"arxiv-2409.11026","DOIUrl":null,"url":null,"abstract":"System prompts that include detailed instructions to describe the task\nperformed by the underlying large language model (LLM) can easily transform\nfoundation models into tools and services with minimal overhead. Because of\ntheir crucial impact on the utility, they are often considered intellectual\nproperty, similar to the code of a software product. However, extracting system\nprompts is easily possible by using prompt injection. As of today, there is no\neffective countermeasure to prevent the stealing of system prompts and all\nsafeguarding efforts could be evaded with carefully crafted prompt injections\nthat bypass all protection mechanisms.In this work, we propose an alternative\nto conventional system prompts. We introduce prompt obfuscation to prevent the\nextraction of the system prompt while maintaining the utility of the system\nitself with only little overhead. The core idea is to find a representation of\nthe original system prompt that leads to the same functionality, while the\nobfuscated system prompt does not contain any information that allows\nconclusions to be drawn about the original system prompt. We implement an\noptimization-based method to find an obfuscated prompt representation while\nmaintaining the functionality. To evaluate our approach, we investigate eight\ndifferent metrics to compare the performance of a system using the original and\nthe obfuscated system prompts, and we show that the obfuscated version is\nconstantly on par with the original one. We further perform three different\ndeobfuscation attacks and show that with access to the obfuscated prompt and\nthe LLM itself, we are not able to consistently extract meaningful information.\nOverall, we showed that prompt obfuscation can be an effective method to\nprotect intellectual property while maintaining the same utility as the\noriginal system prompt.","PeriodicalId":501332,"journal":{"name":"arXiv - CS - Cryptography and Security","volume":"47 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Cryptography and Security","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.11026","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
System prompts that include detailed instructions to describe the task
performed by the underlying large language model (LLM) can easily transform
foundation models into tools and services with minimal overhead. Because of
their crucial impact on the utility, they are often considered intellectual
property, similar to the code of a software product. However, extracting system
prompts is easily possible by using prompt injection. As of today, there is no
effective countermeasure to prevent the stealing of system prompts and all
safeguarding efforts could be evaded with carefully crafted prompt injections
that bypass all protection mechanisms.In this work, we propose an alternative
to conventional system prompts. We introduce prompt obfuscation to prevent the
extraction of the system prompt while maintaining the utility of the system
itself with only little overhead. The core idea is to find a representation of
the original system prompt that leads to the same functionality, while the
obfuscated system prompt does not contain any information that allows
conclusions to be drawn about the original system prompt. We implement an
optimization-based method to find an obfuscated prompt representation while
maintaining the functionality. To evaluate our approach, we investigate eight
different metrics to compare the performance of a system using the original and
the obfuscated system prompts, and we show that the obfuscated version is
constantly on par with the original one. We further perform three different
deobfuscation attacks and show that with access to the obfuscated prompt and
the LLM itself, we are not able to consistently extract meaningful information.
Overall, we showed that prompt obfuscation can be an effective method to
protect intellectual property while maintaining the same utility as the
original system prompt.