Jingqi Hu, Li Li, Hanzhou Wu, Huixin Luo, Xinpeng Zhang
{"title":"Dormant key: Unlocking universal adversarial control in text-to-image models.","authors":"Jingqi Hu, Li Li, Hanzhou Wu, Huixin Luo, Xinpeng Zhang","doi":"10.1016/j.neunet.2025.108065","DOIUrl":null,"url":null,"abstract":"<p><p>Text-to-Image (T2I) diffusion models have gained significant traction due to their remarkable image generation capabilities, raising growing concerns over the security risks associated with their use. Prior studies have shown that malicious users can subtly modify prompts to produce visually misleading or Not-Safe-For-Work (NSFW) content, even bypassing existing safety filters. Existing adversarial attacks are often optimized for specific prompts, limiting their generalizability, and their text-space perturbations are easily detectable by current defenses. To address these limitations, we propose a universal adversarial attack framework called dormant key. It appends a transferable suffix that can be appended as a \"plug-in\" to any text input to guide the generated image toward a specific target. To ensure robustness across diverse prompts, we introduce a novel hierarchical gradient aggregation strategy that stabilizes optimization over prompt batches. This enables efficient learning of universal perturbations in the text space, improving both attack transferability and imperceptibility. Experimental results show that our method effectively balances attack performance and stealth. In NSFW generation tasks, it bypasses major safety mechanisms, including keyword filtering, semantic analysis, and text classifiers, and achieves over 18 % improvement in success rate over baselines.</p>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"193 ","pages":"108065"},"PeriodicalIF":6.3000,"publicationDate":"2025-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neural Networks","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1016/j.neunet.2025.108065","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Text-to-Image (T2I) diffusion models have gained significant traction due to their remarkable image generation capabilities, raising growing concerns over the security risks associated with their use. Prior studies have shown that malicious users can subtly modify prompts to produce visually misleading or Not-Safe-For-Work (NSFW) content, even bypassing existing safety filters. Existing adversarial attacks are often optimized for specific prompts, limiting their generalizability, and their text-space perturbations are easily detectable by current defenses. To address these limitations, we propose a universal adversarial attack framework called dormant key. It appends a transferable suffix that can be appended as a "plug-in" to any text input to guide the generated image toward a specific target. To ensure robustness across diverse prompts, we introduce a novel hierarchical gradient aggregation strategy that stabilizes optimization over prompt batches. This enables efficient learning of universal perturbations in the text space, improving both attack transferability and imperceptibility. Experimental results show that our method effectively balances attack performance and stealth. In NSFW generation tasks, it bypasses major safety mechanisms, including keyword filtering, semantic analysis, and text classifiers, and achieves over 18 % improvement in success rate over baselines.
期刊介绍:
Neural Networks is a platform that aims to foster an international community of scholars and practitioners interested in neural networks, deep learning, and other approaches to artificial intelligence and machine learning. Our journal invites submissions covering various aspects of neural networks research, from computational neuroscience and cognitive modeling to mathematical analyses and engineering applications. By providing a forum for interdisciplinary discussions between biology and technology, we aim to encourage the development of biologically-inspired artificial intelligence.