Xiayang Shi , Shangfeng Chen , Gang Zhang , Wei Wei , Yinlin Li , Zhaoxin Fan , Jingjing Liu
{"title":"针对视觉语言模型的多模态虚拟场景催眠越狱攻击","authors":"Xiayang Shi , Shangfeng Chen , Gang Zhang , Wei Wei , Yinlin Li , Zhaoxin Fan , Jingjing Liu","doi":"10.1016/j.patcog.2025.112391","DOIUrl":null,"url":null,"abstract":"<div><div>Due to the inherent vulnerabilities of large Vision-Language Models (VLMs), security governance has emerged as a critical concern, particularly given the risks posed by noisy and biased training data as well as adversarial attacks, including data poisoning and prompt injection. These perturbations can significantly degrade model performance and introduce multifaceted societal risks. To verify the safe robustness of VLMs and further inspire the design of defensive AI frameworks, we propose Virtual Scenario Hypnosis (VSH), a multimodal prompt injection jailbreak method that embeds malicious queries into prompts through a deceptive narrative framework. This approach strategically distracts the model while compromising its resistance to jailbreak attempts. Our methodology features two key innovations: 1) Targeted adversarial image prompts that transform textual content into visual layouts through optimized typographic designs, circumventing safety alignment mechanisms to elicit harmful responses; and 2) An information veil encrypted In-Context Learning (ICL) method for text prompts that systematically evades safety detection protocols. To streamline evaluation, we employ Large Language Models (LLMs) to facilitate an efficient assessment of jailbreak success rates, supported by a meticulously designed prompt template incorporating multi-dimensional scoring rules and evaluation metrics. Extensive experiments demonstrate the efficacy of VSH, achieving an overall success rate exceeding 82% on 500 harmful queries spanning multiple domains when tested against LLaVA-v1.5-13B and GPT-4o mini.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"172 ","pages":"Article 112391"},"PeriodicalIF":7.6000,"publicationDate":"2025-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Jailbreak attack with multimodal virtual scenario hypnosis for vision-language models\",\"authors\":\"Xiayang Shi , Shangfeng Chen , Gang Zhang , Wei Wei , Yinlin Li , Zhaoxin Fan , Jingjing Liu\",\"doi\":\"10.1016/j.patcog.2025.112391\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Due to the inherent vulnerabilities of large Vision-Language Models (VLMs), security governance has emerged as a critical concern, particularly given the risks posed by noisy and biased training data as well as adversarial attacks, including data poisoning and prompt injection. These perturbations can significantly degrade model performance and introduce multifaceted societal risks. To verify the safe robustness of VLMs and further inspire the design of defensive AI frameworks, we propose Virtual Scenario Hypnosis (VSH), a multimodal prompt injection jailbreak method that embeds malicious queries into prompts through a deceptive narrative framework. This approach strategically distracts the model while compromising its resistance to jailbreak attempts. Our methodology features two key innovations: 1) Targeted adversarial image prompts that transform textual content into visual layouts through optimized typographic designs, circumventing safety alignment mechanisms to elicit harmful responses; and 2) An information veil encrypted In-Context Learning (ICL) method for text prompts that systematically evades safety detection protocols. To streamline evaluation, we employ Large Language Models (LLMs) to facilitate an efficient assessment of jailbreak success rates, supported by a meticulously designed prompt template incorporating multi-dimensional scoring rules and evaluation metrics. Extensive experiments demonstrate the efficacy of VSH, achieving an overall success rate exceeding 82% on 500 harmful queries spanning multiple domains when tested against LLaVA-v1.5-13B and GPT-4o mini.</div></div>\",\"PeriodicalId\":49713,\"journal\":{\"name\":\"Pattern Recognition\",\"volume\":\"172 \",\"pages\":\"Article 112391\"},\"PeriodicalIF\":7.6000,\"publicationDate\":\"2025-09-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Pattern Recognition\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0031320325010520\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pattern Recognition","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0031320325010520","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
Jailbreak attack with multimodal virtual scenario hypnosis for vision-language models
Due to the inherent vulnerabilities of large Vision-Language Models (VLMs), security governance has emerged as a critical concern, particularly given the risks posed by noisy and biased training data as well as adversarial attacks, including data poisoning and prompt injection. These perturbations can significantly degrade model performance and introduce multifaceted societal risks. To verify the safe robustness of VLMs and further inspire the design of defensive AI frameworks, we propose Virtual Scenario Hypnosis (VSH), a multimodal prompt injection jailbreak method that embeds malicious queries into prompts through a deceptive narrative framework. This approach strategically distracts the model while compromising its resistance to jailbreak attempts. Our methodology features two key innovations: 1) Targeted adversarial image prompts that transform textual content into visual layouts through optimized typographic designs, circumventing safety alignment mechanisms to elicit harmful responses; and 2) An information veil encrypted In-Context Learning (ICL) method for text prompts that systematically evades safety detection protocols. To streamline evaluation, we employ Large Language Models (LLMs) to facilitate an efficient assessment of jailbreak success rates, supported by a meticulously designed prompt template incorporating multi-dimensional scoring rules and evaluation metrics. Extensive experiments demonstrate the efficacy of VSH, achieving an overall success rate exceeding 82% on 500 harmful queries spanning multiple domains when tested against LLaVA-v1.5-13B and GPT-4o mini.
期刊介绍:
The field of Pattern Recognition is both mature and rapidly evolving, playing a crucial role in various related fields such as computer vision, image processing, text analysis, and neural networks. It closely intersects with machine learning and is being applied in emerging areas like biometrics, bioinformatics, multimedia data analysis, and data science. The journal Pattern Recognition, established half a century ago during the early days of computer science, has since grown significantly in scope and influence.