Defense-Net: Defend Against a Wide Range of Adversarial Attacks through Adversarial Detector

2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI) Pub Date : 2019-07-15 DOI:10.1109/ISVLSI.2019.00067

A. S. Rakin, Deliang Fan

{"title":"Defense-Net: Defend Against a Wide Range of Adversarial Attacks through Adversarial Detector","authors":"A. S. Rakin, Deliang Fan","doi":"10.1109/ISVLSI.2019.00067","DOIUrl":null,"url":null,"abstract":"Recent studies have demonstrated that Deep Neural Networks(DNNs) are vulnerable to adversarial input perturbations: meticulously engineered slight perturbations can result in inappropriate categorization of valid images. Adversarial Training has been one of the successful defense approaches in recent times. In this work, we propose an alternative to adversarial training by training a separate model with adversarial examples instead of the original classifier. We train an adversarial detector network known as 'Defense-Net' with strong adversary while training the original classifier with only clean training data. We propose a new adversarial cross entropy loss function to train Defense-Net appropriately differentiate between different adversarial examples. Defense-Net solves three major concerns regarding the development of a successful adversarial defense method. First, our defense does not have clean data accuracy degradation in contrast to traditional adversarial training based defenses. Second, we demonstrate this resiliency with experiments on the MNIST and CIFAR-10 data sets, and show that the state-of-the-art accuracy under the most powerful known white-box attack was increased from 94.02 % to 99.2 % on MNIST, and 47 % to 94.79 % on CIFAR-10. Finally, unlike most recent defenses, our approach does not suffer from obfuscated gradient and can successfully defend strong BPDA, PGD, FGSM and C & W attacks.","PeriodicalId":6703,"journal":{"name":"2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"86 1","pages":"332-337"},"PeriodicalIF":0.0000,"publicationDate":"2019-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISVLSI.2019.00067","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

Abstract

Recent studies have demonstrated that Deep Neural Networks(DNNs) are vulnerable to adversarial input perturbations: meticulously engineered slight perturbations can result in inappropriate categorization of valid images. Adversarial Training has been one of the successful defense approaches in recent times. In this work, we propose an alternative to adversarial training by training a separate model with adversarial examples instead of the original classifier. We train an adversarial detector network known as 'Defense-Net' with strong adversary while training the original classifier with only clean training data. We propose a new adversarial cross entropy loss function to train Defense-Net appropriately differentiate between different adversarial examples. Defense-Net solves three major concerns regarding the development of a successful adversarial defense method. First, our defense does not have clean data accuracy degradation in contrast to traditional adversarial training based defenses. Second, we demonstrate this resiliency with experiments on the MNIST and CIFAR-10 data sets, and show that the state-of-the-art accuracy under the most powerful known white-box attack was increased from 94.02 % to 99.2 % on MNIST, and 47 % to 94.79 % on CIFAR-10. Finally, unlike most recent defenses, our approach does not suffer from obfuscated gradient and can successfully defend strong BPDA, PGD, FGSM and C & W attacks.

查看原文本刊更多论文

防御网:通过对抗性检测器防御广泛的对抗性攻击

最近的研究表明，深度神经网络(dnn)容易受到对抗性输入扰动的影响:精心设计的轻微扰动可能导致对有效图像的不适当分类。对抗训练是近年来成功的防御方法之一。在这项工作中，我们提出了一种对抗训练的替代方法，即用对抗样本来训练一个单独的模型，而不是原始的分类器。我们用强大的对手训练一个称为“Defense-Net”的对抗性检测器网络，同时只使用干净的训练数据训练原始分类器。我们提出了一种新的对抗性交叉熵损失函数来训练防御网络，以适当区分不同的对抗性样本。防御网解决了有关开发成功的对抗性防御方法的三个主要问题。首先，与传统的基于对抗性训练的防御相比，我们的防御没有干净的数据准确性下降。其次，我们通过MNIST和CIFAR-10数据集的实验证明了这种弹性，并表明在已知最强大的白盒攻击下，最先进的准确率在MNIST上从94.02%增加到99.2%，在CIFAR-10上从47%增加到94.79%。最后，与最近的防御不同，我们的方法不受混淆梯度的影响，可以成功防御强大的BPDA, PGD, FGSM和C & W攻击。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)

自引率

0.00%

发文量