通过特征空间深度强化学习实现黑箱对抗攻击

2021 IEEE Conference on Dependable and Secure Computing (DSC) Pub Date : 2021-01-30 DOI:10.1109/DSC49826.2021.9346264

Lyue Li, Amir Rezapour, Wen-Guey Tzeng

{"title":"通过特征空间深度强化学习实现黑箱对抗攻击","authors":"Lyue Li, Amir Rezapour, Wen-Guey Tzeng","doi":"10.1109/DSC49826.2021.9346264","DOIUrl":null,"url":null,"abstract":"In this paper we propose a novel black-box adversarial attack by using the reinforcement learning to learn the characteristics of the target classifier C. Our method does not need to find a substitute classifier that resembles $C$ with respect to its structure and parameters. Instead, our method learns an optimal attacking policy of guiding the attacker to build an adversarial image from the original image. We work on the feature space of images, instead of the pixels of images directly. Our method achieves better results on many measures. Our method achieves 94.5 % attack success rate on a well-trained digit classifier. Our adversarial images have better imperceptibility even though the norm distances to original images are larger than other methods. Since our method works on the characteristics of a classifier, it has better transferability. The transfer rate of our method could reach 52.1 % for a targeted class and 65.9% for a non-targeted class. This improves over previous results of single-digit transfer rates. Also, we show that it is harder to defend our attack by incorporating defense mechanisms, such as MagNet, which uses a denoising technique. We show that our method achieves 65% attack success rate even though the target classifier employs MagNet to defend.","PeriodicalId":184504,"journal":{"name":"2021 IEEE Conference on Dependable and Secure Computing (DSC)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Black-Box Adversarial Attack via Deep Reinforcement Learning on the Feature Space\",\"authors\":\"Lyue Li, Amir Rezapour, Wen-Guey Tzeng\",\"doi\":\"10.1109/DSC49826.2021.9346264\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper we propose a novel black-box adversarial attack by using the reinforcement learning to learn the characteristics of the target classifier C. Our method does not need to find a substitute classifier that resembles $C$ with respect to its structure and parameters. Instead, our method learns an optimal attacking policy of guiding the attacker to build an adversarial image from the original image. We work on the feature space of images, instead of the pixels of images directly. Our method achieves better results on many measures. Our method achieves 94.5 % attack success rate on a well-trained digit classifier. Our adversarial images have better imperceptibility even though the norm distances to original images are larger than other methods. Since our method works on the characteristics of a classifier, it has better transferability. The transfer rate of our method could reach 52.1 % for a targeted class and 65.9% for a non-targeted class. This improves over previous results of single-digit transfer rates. Also, we show that it is harder to defend our attack by incorporating defense mechanisms, such as MagNet, which uses a denoising technique. We show that our method achieves 65% attack success rate even though the target classifier employs MagNet to defend.\",\"PeriodicalId\":184504,\"journal\":{\"name\":\"2021 IEEE Conference on Dependable and Secure Computing (DSC)\",\"volume\":\"18 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-01-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 IEEE Conference on Dependable and Secure Computing (DSC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/DSC49826.2021.9346264\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE Conference on Dependable and Secure Computing (DSC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DSC49826.2021.9346264","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

在本文中，我们提出了一种新颖的黑盒对抗攻击方法，即利用强化学习来学习目标分类器 C 的特征。相反，我们的方法可以学习最佳的攻击策略，引导攻击者从原始图像中建立对抗图像。我们的工作对象是图像的特征空间，而不是直接图像的像素。我们的方法在许多指标上都取得了更好的结果。在训练有素的数字分类器上，我们的方法取得了 94.5% 的攻击成功率。我们的对抗图像具有更好的不可感知性，即使与原始图像的标准距离比其他方法更大。由于我们的方法基于分类器的特征，因此具有更好的可移植性。我们的方法对目标类别的转移率可达 52.1%，对非目标类别的转移率可达 65.9%。这比以前个位数的转移率有所提高。此外，我们还表明，通过采用防御机制（如使用去噪技术的 MagNet）来防御我们的攻击更加困难。我们的研究表明，即使目标分类器采用 MagNet 进行防御，我们的方法也能达到 65% 的攻击成功率。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A Black-Box Adversarial Attack via Deep Reinforcement Learning on the Feature Space

In this paper we propose a novel black-box adversarial attack by using the reinforcement learning to learn the characteristics of the target classifier C. Our method does not need to find a substitute classifier that resembles $C$ with respect to its structure and parameters. Instead, our method learns an optimal attacking policy of guiding the attacker to build an adversarial image from the original image. We work on the feature space of images, instead of the pixels of images directly. Our method achieves better results on many measures. Our method achieves 94.5 % attack success rate on a well-trained digit classifier. Our adversarial images have better imperceptibility even though the norm distances to original images are larger than other methods. Since our method works on the characteristics of a classifier, it has better transferability. The transfer rate of our method could reach 52.1 % for a targeted class and 65.9% for a non-targeted class. This improves over previous results of single-digit transfer rates. Also, we show that it is harder to defend our attack by incorporating defense mechanisms, such as MagNet, which uses a denoising technique. We show that our method achieves 65% attack success rate even though the target classifier employs MagNet to defend.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2021 IEEE Conference on Dependable and Secure Computing (DSC)

自引率

0.00%

发文量