针对视觉语言模型的多模态虚拟场景催眠越狱攻击

IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Xiayang Shi , Shangfeng Chen , Gang Zhang , Wei Wei , Yinlin Li , Zhaoxin Fan , Jingjing Liu
{"title":"针对视觉语言模型的多模态虚拟场景催眠越狱攻击","authors":"Xiayang Shi ,&nbsp;Shangfeng Chen ,&nbsp;Gang Zhang ,&nbsp;Wei Wei ,&nbsp;Yinlin Li ,&nbsp;Zhaoxin Fan ,&nbsp;Jingjing Liu","doi":"10.1016/j.patcog.2025.112391","DOIUrl":null,"url":null,"abstract":"<div><div>Due to the inherent vulnerabilities of large Vision-Language Models (VLMs), security governance has emerged as a critical concern, particularly given the risks posed by noisy and biased training data as well as adversarial attacks, including data poisoning and prompt injection. These perturbations can significantly degrade model performance and introduce multifaceted societal risks. To verify the safe robustness of VLMs and further inspire the design of defensive AI frameworks, we propose Virtual Scenario Hypnosis (VSH), a multimodal prompt injection jailbreak method that embeds malicious queries into prompts through a deceptive narrative framework. This approach strategically distracts the model while compromising its resistance to jailbreak attempts. Our methodology features two key innovations: 1) Targeted adversarial image prompts that transform textual content into visual layouts through optimized typographic designs, circumventing safety alignment mechanisms to elicit harmful responses; and 2) An information veil encrypted In-Context Learning (ICL) method for text prompts that systematically evades safety detection protocols. To streamline evaluation, we employ Large Language Models (LLMs) to facilitate an efficient assessment of jailbreak success rates, supported by a meticulously designed prompt template incorporating multi-dimensional scoring rules and evaluation metrics. Extensive experiments demonstrate the efficacy of VSH, achieving an overall success rate exceeding 82% on 500 harmful queries spanning multiple domains when tested against LLaVA-v1.5-13B and GPT-4o mini.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"172 ","pages":"Article 112391"},"PeriodicalIF":7.6000,"publicationDate":"2025-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Jailbreak attack with multimodal virtual scenario hypnosis for vision-language models\",\"authors\":\"Xiayang Shi ,&nbsp;Shangfeng Chen ,&nbsp;Gang Zhang ,&nbsp;Wei Wei ,&nbsp;Yinlin Li ,&nbsp;Zhaoxin Fan ,&nbsp;Jingjing Liu\",\"doi\":\"10.1016/j.patcog.2025.112391\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Due to the inherent vulnerabilities of large Vision-Language Models (VLMs), security governance has emerged as a critical concern, particularly given the risks posed by noisy and biased training data as well as adversarial attacks, including data poisoning and prompt injection. These perturbations can significantly degrade model performance and introduce multifaceted societal risks. To verify the safe robustness of VLMs and further inspire the design of defensive AI frameworks, we propose Virtual Scenario Hypnosis (VSH), a multimodal prompt injection jailbreak method that embeds malicious queries into prompts through a deceptive narrative framework. This approach strategically distracts the model while compromising its resistance to jailbreak attempts. Our methodology features two key innovations: 1) Targeted adversarial image prompts that transform textual content into visual layouts through optimized typographic designs, circumventing safety alignment mechanisms to elicit harmful responses; and 2) An information veil encrypted In-Context Learning (ICL) method for text prompts that systematically evades safety detection protocols. To streamline evaluation, we employ Large Language Models (LLMs) to facilitate an efficient assessment of jailbreak success rates, supported by a meticulously designed prompt template incorporating multi-dimensional scoring rules and evaluation metrics. Extensive experiments demonstrate the efficacy of VSH, achieving an overall success rate exceeding 82% on 500 harmful queries spanning multiple domains when tested against LLaVA-v1.5-13B and GPT-4o mini.</div></div>\",\"PeriodicalId\":49713,\"journal\":{\"name\":\"Pattern Recognition\",\"volume\":\"172 \",\"pages\":\"Article 112391\"},\"PeriodicalIF\":7.6000,\"publicationDate\":\"2025-09-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Pattern Recognition\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0031320325010520\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pattern Recognition","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0031320325010520","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

摘要

由于大型视觉语言模型(VLMs)固有的漏洞,安全治理已经成为一个关键问题,特别是考虑到噪声和有偏见的训练数据以及对抗性攻击(包括数据中毒和及时注入)所带来的风险。这些扰动会显著降低模型的性能,并引入多方面的社会风险。为了验证vlm的安全鲁棒性并进一步启发防御性AI框架的设计,我们提出了虚拟场景催眠(VSH),这是一种多模态提示注入越狱方法,通过欺骗性叙事框架将恶意查询嵌入提示中。这种方法在策略上分散了模型的注意力,同时损害了模型对越狱尝试的抵抗力。我们的方法有两个关键创新:1)有针对性的对抗性图像提示,通过优化的排版设计将文本内容转化为视觉布局,绕过安全对齐机制,引发有害反应;2)一种信息面纱加密的上下文学习(ICL)方法,用于系统地逃避安全检测协议的文本提示。为了简化评估,我们采用大型语言模型(llm)来促进对越狱成功率的有效评估,并辅以精心设计的包含多维评分规则和评估指标的提示模板。大量的实验证明了VSH的有效性,在针对LLaVA-v1.5-13B和gpt - 40 mini进行测试时,VSH在跨越多个域的500个有害查询中实现了超过82%的总体成功率。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

Jailbreak attack with multimodal virtual scenario hypnosis for vision-language models

Jailbreak attack with multimodal virtual scenario hypnosis for vision-language models
Due to the inherent vulnerabilities of large Vision-Language Models (VLMs), security governance has emerged as a critical concern, particularly given the risks posed by noisy and biased training data as well as adversarial attacks, including data poisoning and prompt injection. These perturbations can significantly degrade model performance and introduce multifaceted societal risks. To verify the safe robustness of VLMs and further inspire the design of defensive AI frameworks, we propose Virtual Scenario Hypnosis (VSH), a multimodal prompt injection jailbreak method that embeds malicious queries into prompts through a deceptive narrative framework. This approach strategically distracts the model while compromising its resistance to jailbreak attempts. Our methodology features two key innovations: 1) Targeted adversarial image prompts that transform textual content into visual layouts through optimized typographic designs, circumventing safety alignment mechanisms to elicit harmful responses; and 2) An information veil encrypted In-Context Learning (ICL) method for text prompts that systematically evades safety detection protocols. To streamline evaluation, we employ Large Language Models (LLMs) to facilitate an efficient assessment of jailbreak success rates, supported by a meticulously designed prompt template incorporating multi-dimensional scoring rules and evaluation metrics. Extensive experiments demonstrate the efficacy of VSH, achieving an overall success rate exceeding 82% on 500 harmful queries spanning multiple domains when tested against LLaVA-v1.5-13B and GPT-4o mini.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Pattern Recognition
Pattern Recognition 工程技术-工程:电子与电气
CiteScore
14.40
自引率
16.20%
发文量
683
审稿时长
5.6 months
期刊介绍: The field of Pattern Recognition is both mature and rapidly evolving, playing a crucial role in various related fields such as computer vision, image processing, text analysis, and neural networks. It closely intersects with machine learning and is being applied in emerging areas like biometrics, bioinformatics, multimedia data analysis, and data science. The journal Pattern Recognition, established half a century ago during the early days of computer science, has since grown significantly in scope and influence.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信