检测对深度神经网络的对抗性示例攻击

Proceedings of the 15th International Workshop on Content-Based Multimedia Indexing Pub Date : 2017-06-19 DOI:10.1145/3095713.3095753

F. Carrara, F. Falchi, R. Caldelli, Giuseppe Amato, Roberta Fumarola, Rudy Becarelli

{"title":"检测对深度神经网络的对抗性示例攻击","authors":"F. Carrara, F. Falchi, R. Caldelli, Giuseppe Amato, Roberta Fumarola, Rudy Becarelli","doi":"10.1145/3095713.3095753","DOIUrl":null,"url":null,"abstract":"Deep learning has recently become the state of the art in many computer vision applications and in image classification in particular. However, recent works have shown that it is quite easy to create adversarial examples, i.e., images intentionally created or modified to cause the deep neural network to make a mistake. They are like optical illusions for machines containing changes unnoticeable to the human eye. This represents a serious threat for machine learning methods. In this paper, we investigate the robustness of the representations learned by the fooled neural network, analyzing the activations of its hidden layers. Specifically, we tested scoring approaches used for kNN classification, in order to distinguishing between correctly classified authentic images and adversarial examples. The results show that hidden layers activations can be used to detect incorrect classifications caused by adversarial attacks.","PeriodicalId":310224,"journal":{"name":"Proceedings of the 15th International Workshop on Content-Based Multimedia Indexing","volume":"31 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"33","resultStr":"{\"title\":\"Detecting adversarial example attacks to deep neural networks\",\"authors\":\"F. Carrara, F. Falchi, R. Caldelli, Giuseppe Amato, Roberta Fumarola, Rudy Becarelli\",\"doi\":\"10.1145/3095713.3095753\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Deep learning has recently become the state of the art in many computer vision applications and in image classification in particular. However, recent works have shown that it is quite easy to create adversarial examples, i.e., images intentionally created or modified to cause the deep neural network to make a mistake. They are like optical illusions for machines containing changes unnoticeable to the human eye. This represents a serious threat for machine learning methods. In this paper, we investigate the robustness of the representations learned by the fooled neural network, analyzing the activations of its hidden layers. Specifically, we tested scoring approaches used for kNN classification, in order to distinguishing between correctly classified authentic images and adversarial examples. The results show that hidden layers activations can be used to detect incorrect classifications caused by adversarial attacks.\",\"PeriodicalId\":310224,\"journal\":{\"name\":\"Proceedings of the 15th International Workshop on Content-Based Multimedia Indexing\",\"volume\":\"31 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-06-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"33\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 15th International Workshop on Content-Based Multimedia Indexing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3095713.3095753\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 15th International Workshop on Content-Based Multimedia Indexing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3095713.3095753","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 33

摘要

深度学习最近已经成为许多计算机视觉应用的最新技术，特别是在图像分类方面。然而，最近的研究表明，创建对抗性示例非常容易，即故意创建或修改图像以导致深度神经网络犯错误。它们就像机器的视觉错觉，包含人眼无法察觉的变化。这对机器学习方法构成了严重威胁。在本文中，我们通过分析其隐藏层的激活来研究被愚弄神经网络学习到的表征的鲁棒性。具体来说，我们测试了用于kNN分类的评分方法，以区分正确分类的真实图像和对抗示例。结果表明，隐藏层激活可以用于检测由对抗性攻击引起的错误分类。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Detecting adversarial example attacks to deep neural networks

Deep learning has recently become the state of the art in many computer vision applications and in image classification in particular. However, recent works have shown that it is quite easy to create adversarial examples, i.e., images intentionally created or modified to cause the deep neural network to make a mistake. They are like optical illusions for machines containing changes unnoticeable to the human eye. This represents a serious threat for machine learning methods. In this paper, we investigate the robustness of the representations learned by the fooled neural network, analyzing the activations of its hidden layers. Specifically, we tested scoring approaches used for kNN classification, in order to distinguishing between correctly classified authentic images and adversarial examples. The results show that hidden layers activations can be used to detect incorrect classifications caused by adversarial attacks.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 15th International Workshop on Content-Based Multimedia Indexing

自引率

0.00%

发文量