基于零阶优化的无训练替代模型的深度神经网络黑盒攻击

Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security Pub Date : 2017-08-14 DOI:10.1145/3128572.3140448

Pin-Yu Chen, Huan Zhang, Yash Sharma, Jinfeng Yi, Cho-Jui Hsieh

{"title":"基于零阶优化的无训练替代模型的深度神经网络黑盒攻击","authors":"Pin-Yu Chen, Huan Zhang, Yash Sharma, Jinfeng Yi, Cho-Jui Hsieh","doi":"10.1145/3128572.3140448","DOIUrl":null,"url":null,"abstract":"Deep neural networks (DNNs) are one of the most prominent technologies of our time, as they achieve state-of-the-art performance in many machine learning tasks, including but not limited to image classification, text mining, and speech processing. However, recent research on DNNs has indicated ever-increasing concern on the robustness to adversarial examples, especially for security-critical tasks such as traffic sign identification for autonomous driving. Studies have unveiled the vulnerability of a well-trained DNN by demonstrating the ability of generating barely noticeable (to both human and machines) adversarial images that lead to misclassification. Furthermore, researchers have shown that these adversarial images are highly transferable by simply training and attacking a substitute model built upon the target model, known as a black-box attack to DNNs. Similar to the setting of training substitute models, in this paper we propose an effective black-box attack that also only has access to the input (images) and the output (confidence scores) of a targeted DNN. However, different from leveraging attack transferability from substitute models, we propose zeroth order optimization (ZOO) based attacks to directly estimate the gradients of the targeted DNN for generating adversarial examples. We use zeroth order stochastic coordinate descent along with dimension reduction, hierarchical attack and importance sampling techniques to efficiently attack black-box models. By exploiting zeroth order optimization, improved attacks to the targeted DNN can be accomplished, sparing the need for training substitute models and avoiding the loss in attack transferability. Experimental results on MNIST, CIFAR10 and ImageNet show that the proposed ZOO attack is as effective as the state-of-the-art white-box attack (e.g., Carlini and Wagner's attack) and significantly outperforms existing black-box attacks via substitute models.","PeriodicalId":318259,"journal":{"name":"Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1445","resultStr":"{\"title\":\"ZOO: Zeroth Order Optimization Based Black-box Attacks to Deep Neural Networks without Training Substitute Models\",\"authors\":\"Pin-Yu Chen, Huan Zhang, Yash Sharma, Jinfeng Yi, Cho-Jui Hsieh\",\"doi\":\"10.1145/3128572.3140448\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Deep neural networks (DNNs) are one of the most prominent technologies of our time, as they achieve state-of-the-art performance in many machine learning tasks, including but not limited to image classification, text mining, and speech processing. However, recent research on DNNs has indicated ever-increasing concern on the robustness to adversarial examples, especially for security-critical tasks such as traffic sign identification for autonomous driving. Studies have unveiled the vulnerability of a well-trained DNN by demonstrating the ability of generating barely noticeable (to both human and machines) adversarial images that lead to misclassification. Furthermore, researchers have shown that these adversarial images are highly transferable by simply training and attacking a substitute model built upon the target model, known as a black-box attack to DNNs. Similar to the setting of training substitute models, in this paper we propose an effective black-box attack that also only has access to the input (images) and the output (confidence scores) of a targeted DNN. However, different from leveraging attack transferability from substitute models, we propose zeroth order optimization (ZOO) based attacks to directly estimate the gradients of the targeted DNN for generating adversarial examples. We use zeroth order stochastic coordinate descent along with dimension reduction, hierarchical attack and importance sampling techniques to efficiently attack black-box models. By exploiting zeroth order optimization, improved attacks to the targeted DNN can be accomplished, sparing the need for training substitute models and avoiding the loss in attack transferability. Experimental results on MNIST, CIFAR10 and ImageNet show that the proposed ZOO attack is as effective as the state-of-the-art white-box attack (e.g., Carlini and Wagner's attack) and significantly outperforms existing black-box attacks via substitute models.\",\"PeriodicalId\":318259,\"journal\":{\"name\":\"Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-08-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1445\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3128572.3140448\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3128572.3140448","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1445

摘要

深度神经网络(dnn)是我们这个时代最突出的技术之一，因为它们在许多机器学习任务中实现了最先进的性能，包括但不限于图像分类、文本挖掘和语音处理。然而，最近对深度神经网络的研究表明，人们越来越关注对对抗示例的鲁棒性，特别是对于安全关键任务，如自动驾驶的交通标志识别。研究揭示了训练有素的深度神经网络的脆弱性，展示了生成(对人类和机器来说)几乎不明显的对抗图像的能力，这些图像会导致错误分类。此外，研究人员已经证明，通过简单地训练和攻击建立在目标模型之上的替代模型，这些对抗性图像具有高度可转移性，这被称为对dnn的黑盒攻击。与训练替代模型的设置类似，在本文中，我们提出了一种有效的黑盒攻击，该攻击也只能访问目标DNN的输入(图像)和输出(置信度分数)。然而，与利用替代模型的攻击可转移性不同，我们提出了基于零阶优化(ZOO)的攻击来直接估计目标DNN的梯度以生成对抗性示例。采用零阶随机坐标下降、降维、分层攻击和重要采样技术对黑盒模型进行有效攻击。通过利用零阶优化，可以完成对目标DNN的改进攻击，省去了训练替代模型的需要，避免了攻击可转移性的损失。在MNIST, CIFAR10和ImageNet上的实验结果表明，所提出的ZOO攻击与最先进的白盒攻击(例如Carlini和Wagner的攻击)一样有效，并且通过替代模型显著优于现有的黑盒攻击。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

ZOO: Zeroth Order Optimization Based Black-box Attacks to Deep Neural Networks without Training Substitute Models

Deep neural networks (DNNs) are one of the most prominent technologies of our time, as they achieve state-of-the-art performance in many machine learning tasks, including but not limited to image classification, text mining, and speech processing. However, recent research on DNNs has indicated ever-increasing concern on the robustness to adversarial examples, especially for security-critical tasks such as traffic sign identification for autonomous driving. Studies have unveiled the vulnerability of a well-trained DNN by demonstrating the ability of generating barely noticeable (to both human and machines) adversarial images that lead to misclassification. Furthermore, researchers have shown that these adversarial images are highly transferable by simply training and attacking a substitute model built upon the target model, known as a black-box attack to DNNs. Similar to the setting of training substitute models, in this paper we propose an effective black-box attack that also only has access to the input (images) and the output (confidence scores) of a targeted DNN. However, different from leveraging attack transferability from substitute models, we propose zeroth order optimization (ZOO) based attacks to directly estimate the gradients of the targeted DNN for generating adversarial examples. We use zeroth order stochastic coordinate descent along with dimension reduction, hierarchical attack and importance sampling techniques to efficiently attack black-box models. By exploiting zeroth order optimization, improved attacks to the targeted DNN can be accomplished, sparing the need for training substitute models and avoiding the loss in attack transferability. Experimental results on MNIST, CIFAR10 and ImageNet show that the proposed ZOO attack is as effective as the state-of-the-art white-box attack (e.g., Carlini and Wagner's attack) and significantly outperforms existing black-box attacks via substitute models.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security

自引率

0.00%

发文量