{"title":"通过特征空间深度强化学习实现黑箱对抗攻击","authors":"Lyue Li, Amir Rezapour, Wen-Guey Tzeng","doi":"10.1109/DSC49826.2021.9346264","DOIUrl":null,"url":null,"abstract":"In this paper we propose a novel black-box adversarial attack by using the reinforcement learning to learn the characteristics of the target classifier C. Our method does not need to find a substitute classifier that resembles $C$ with respect to its structure and parameters. Instead, our method learns an optimal attacking policy of guiding the attacker to build an adversarial image from the original image. We work on the feature space of images, instead of the pixels of images directly. Our method achieves better results on many measures. Our method achieves 94.5 % attack success rate on a well-trained digit classifier. Our adversarial images have better imperceptibility even though the norm distances to original images are larger than other methods. Since our method works on the characteristics of a classifier, it has better transferability. The transfer rate of our method could reach 52.1 % for a targeted class and 65.9% for a non-targeted class. This improves over previous results of single-digit transfer rates. Also, we show that it is harder to defend our attack by incorporating defense mechanisms, such as MagNet, which uses a denoising technique. We show that our method achieves 65% attack success rate even though the target classifier employs MagNet to defend.","PeriodicalId":184504,"journal":{"name":"2021 IEEE Conference on Dependable and Secure Computing (DSC)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Black-Box Adversarial Attack via Deep Reinforcement Learning on the Feature Space\",\"authors\":\"Lyue Li, Amir Rezapour, Wen-Guey Tzeng\",\"doi\":\"10.1109/DSC49826.2021.9346264\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper we propose a novel black-box adversarial attack by using the reinforcement learning to learn the characteristics of the target classifier C. Our method does not need to find a substitute classifier that resembles $C$ with respect to its structure and parameters. Instead, our method learns an optimal attacking policy of guiding the attacker to build an adversarial image from the original image. We work on the feature space of images, instead of the pixels of images directly. Our method achieves better results on many measures. Our method achieves 94.5 % attack success rate on a well-trained digit classifier. Our adversarial images have better imperceptibility even though the norm distances to original images are larger than other methods. Since our method works on the characteristics of a classifier, it has better transferability. The transfer rate of our method could reach 52.1 % for a targeted class and 65.9% for a non-targeted class. This improves over previous results of single-digit transfer rates. Also, we show that it is harder to defend our attack by incorporating defense mechanisms, such as MagNet, which uses a denoising technique. We show that our method achieves 65% attack success rate even though the target classifier employs MagNet to defend.\",\"PeriodicalId\":184504,\"journal\":{\"name\":\"2021 IEEE Conference on Dependable and Secure Computing (DSC)\",\"volume\":\"18 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-01-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 IEEE Conference on Dependable and Secure Computing (DSC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/DSC49826.2021.9346264\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE Conference on Dependable and Secure Computing (DSC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DSC49826.2021.9346264","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A Black-Box Adversarial Attack via Deep Reinforcement Learning on the Feature Space
In this paper we propose a novel black-box adversarial attack by using the reinforcement learning to learn the characteristics of the target classifier C. Our method does not need to find a substitute classifier that resembles $C$ with respect to its structure and parameters. Instead, our method learns an optimal attacking policy of guiding the attacker to build an adversarial image from the original image. We work on the feature space of images, instead of the pixels of images directly. Our method achieves better results on many measures. Our method achieves 94.5 % attack success rate on a well-trained digit classifier. Our adversarial images have better imperceptibility even though the norm distances to original images are larger than other methods. Since our method works on the characteristics of a classifier, it has better transferability. The transfer rate of our method could reach 52.1 % for a targeted class and 65.9% for a non-targeted class. This improves over previous results of single-digit transfer rates. Also, we show that it is harder to defend our attack by incorporating defense mechanisms, such as MagNet, which uses a denoising technique. We show that our method achieves 65% attack success rate even though the target classifier employs MagNet to defend.