{"title":"On The Generation of Unrestricted Adversarial Examples","authors":"Mehrgan Khoshpasand, A. Ghorbani","doi":"10.1109/DSN-W50199.2020.00012","DOIUrl":null,"url":null,"abstract":"Adversarial examples are inputs designed by an adversary with the goal of fooling the machine learning models. Most of the research about adversarial examples have focused on perturbing the natural inputs with the assumption that the true label remains unchanged. Even in this limited setting and despite extensive studies in recent years, there is no defence against adversarial examples for complex tasks (e.g., ImageNet). However, for simpler tasks like handwritten digit classification, a robust model seems to be within reach. Unlike perturbation-based adversarial examples, the adversary is not limited to small norm-based perturbations in unrestricted adversarial examples. Hence, defending against unrestricted adversarial examples is a more challenging task.In this paper, we show that previous methods for generating unrestricted adversarial examples ignored a large part of the adversarial subspace. In particular, we demonstrate the bias of previous methods towards generating samples that are far inside the decision boundaries of an auxiliary classifier. We also show the similarity of the decision boundaries of an auxiliary classifier and baseline CNNs. By putting these two evidence together, we explain why adversarial examples generated by the previous approaches lack the desired transferability. Additionally, we present an efficient technique to create adversarial examples using generative adversarial networks to address this issue. We demonstrate that even the state-of-the-art MNIST classifiers are vulnerable to the adversarial examples generated with this technique. Additionally, we show that examples generated with our method are transferable. Accordingly, we hope that new proposed defences use this attack to evaluate the robustness of their models against unrestricted attacks.","PeriodicalId":427687,"journal":{"name":"2020 50th Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshops (DSN-W)","volume":"62 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 50th Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshops (DSN-W)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DSN-W50199.2020.00012","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Adversarial examples are inputs designed by an adversary with the goal of fooling the machine learning models. Most of the research about adversarial examples have focused on perturbing the natural inputs with the assumption that the true label remains unchanged. Even in this limited setting and despite extensive studies in recent years, there is no defence against adversarial examples for complex tasks (e.g., ImageNet). However, for simpler tasks like handwritten digit classification, a robust model seems to be within reach. Unlike perturbation-based adversarial examples, the adversary is not limited to small norm-based perturbations in unrestricted adversarial examples. Hence, defending against unrestricted adversarial examples is a more challenging task.In this paper, we show that previous methods for generating unrestricted adversarial examples ignored a large part of the adversarial subspace. In particular, we demonstrate the bias of previous methods towards generating samples that are far inside the decision boundaries of an auxiliary classifier. We also show the similarity of the decision boundaries of an auxiliary classifier and baseline CNNs. By putting these two evidence together, we explain why adversarial examples generated by the previous approaches lack the desired transferability. Additionally, we present an efficient technique to create adversarial examples using generative adversarial networks to address this issue. We demonstrate that even the state-of-the-art MNIST classifiers are vulnerable to the adversarial examples generated with this technique. Additionally, we show that examples generated with our method are transferable. Accordingly, we hope that new proposed defences use this attack to evaluate the robustness of their models against unrestricted attacks.