Jujutsu: A Two-stage Defense against Adversarial Patch Attacks on Deep Neural Networks

Proceedings of the 2023 ACM Asia Conference on Computer and Communications Security Pub Date : 2021-08-11 DOI:10.1145/3579856.3582816

Zitao Chen, Pritam Dash, K. Pattabiraman

{"title":"Jujutsu: A Two-stage Defense against Adversarial Patch Attacks on Deep Neural Networks","authors":"Zitao Chen, Pritam Dash, K. Pattabiraman","doi":"10.1145/3579856.3582816","DOIUrl":null,"url":null,"abstract":"Adversarial patch attacks create adversarial examples by injecting arbitrary distortions within a bounded region of the input to fool deep neural networks (DNNs). These attacks are robust (i.e., physically-realizable) and universally malicious, and hence represent a severe security threat to real-world DNN-based systems. We propose Jujutsu, a two-stage technique to detect and mitigate robust and universal adversarial patch attacks. We first observe that adversarial patches are crafted as localized features that yield large influence on the prediction output, and continue to dominate the prediction on any input. Jujutsu leverages this observation for accurate attack detection with low false positives. Patch attacks corrupt only a localized region of the input, while the majority of the input remains unperturbed. Therefore, Jujutsu leverages generative adversarial networks (GAN) to perform localized attack recovery by synthesizing the semantic contents of the input that are corrupted by the attacks, and reconstructs a “clean” input for correct prediction. We evaluate Jujutsu on four diverse datasets spanning 8 different DNN models, and find that it achieves superior performance and significantly outperforms four existing defenses. We further evaluate Jujutsu against physical-world attacks, as well as adaptive attacks.","PeriodicalId":156082,"journal":{"name":"Proceedings of the 2023 ACM Asia Conference on Computer and Communications Security","volume":"242 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2023 ACM Asia Conference on Computer and Communications Security","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3579856.3582816","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

Adversarial patch attacks create adversarial examples by injecting arbitrary distortions within a bounded region of the input to fool deep neural networks (DNNs). These attacks are robust (i.e., physically-realizable) and universally malicious, and hence represent a severe security threat to real-world DNN-based systems. We propose Jujutsu, a two-stage technique to detect and mitigate robust and universal adversarial patch attacks. We first observe that adversarial patches are crafted as localized features that yield large influence on the prediction output, and continue to dominate the prediction on any input. Jujutsu leverages this observation for accurate attack detection with low false positives. Patch attacks corrupt only a localized region of the input, while the majority of the input remains unperturbed. Therefore, Jujutsu leverages generative adversarial networks (GAN) to perform localized attack recovery by synthesizing the semantic contents of the input that are corrupted by the attacks, and reconstructs a “clean” input for correct prediction. We evaluate Jujutsu on four diverse datasets spanning 8 different DNN models, and find that it achieves superior performance and significantly outperforms four existing defenses. We further evaluate Jujutsu against physical-world attacks, as well as adaptive attacks.

查看原文本刊更多论文

柔术:深度神经网络对抗性补丁攻击的两阶段防御

对抗性补丁攻击通过在输入的有界区域内注入任意扭曲来创建对抗性示例，以欺骗深度神经网络(dnn)。这些攻击是强大的(即，物理上可实现的)和普遍恶意的，因此对现实世界中基于dnn的系统构成了严重的安全威胁。我们提出了一种两阶段技术，用于检测和减轻鲁棒和通用的对抗性补丁攻击。我们首先观察到，对抗性补丁被制作为局部特征，对预测输出产生很大的影响，并在任何输入上继续主导预测。柔术利用这种观察进行准确的攻击检测，误报率低。补丁攻击只破坏输入的局部区域，而大部分输入保持不受干扰。因此，Jujutsu利用生成式对抗网络(GAN)通过合成被攻击破坏的输入的语义内容来执行局部攻击恢复，并重建一个“干净”的输入以进行正确的预测。我们在跨越8种不同DNN模型的4个不同数据集上对Jujutsu进行了评估，发现它达到了卓越的性能，并且显著优于4种现有的防御。我们进一步评估柔术对抗物理世界的攻击，以及自适应攻击。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 2023 ACM Asia Conference on Computer and Communications Security

自引率

0.00%

发文量