鲁棒特征的脆弱性:模型鲁棒性和虚幻鲁棒性的探索性分析

2023 IEEE Security and Privacy Workshops (SPW) Pub Date : 2023-05-01 DOI:10.1109/SPW59333.2023.00009

Alireza Aghabagherloo, Rafa Gálvez, D. Preuveneers, B. Preneel

{"title":"鲁棒特征的脆弱性:模型鲁棒性和虚幻鲁棒性的探索性分析","authors":"Alireza Aghabagherloo, Rafa Gálvez, D. Preuveneers, B. Preneel","doi":"10.1109/SPW59333.2023.00009","DOIUrl":null,"url":null,"abstract":"Neural networks have been shown to be vulnerable to visual data perturbations imperceptible to the human eye. Nowadays, the leading hypothesis about the reason for the existence of these adversarial examples is the presence of non-robust features, which are highly predictive but brittle. Also, it has been shown that there exist two types of non-robust features depending on whether or not they are entangled with robust features; perturbing non-robust features entangled with robust features can form adversarial examples. This paper extends earlier work by showing that models trained exclusively on robust features are still vulnerable to one type of adversarial example. Standard-trained networks can classify more accurately than robustly trained networks in this situation. Our experiments show that this phenomenon is due to the high correlation between most of the robust features and both correct and incorrect labels. In this work, we define features highly correlated with correct and incorrect labels as illusionary robust features. We discuss how perturbing an image attacking robust models affects the feature space. Based on our observations on the feature space, we explain why standard models are more successful in correctly classifying these perturbed images than robustly trained models. Our observations also show that, similar to changing non-robust features, changing some of the robust features is still imperceptible to the human eye.","PeriodicalId":308378,"journal":{"name":"2023 IEEE Security and Privacy Workshops (SPW)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"On the Brittleness of Robust Features: An Exploratory Analysis of Model Robustness and Illusionary Robust Features\",\"authors\":\"Alireza Aghabagherloo, Rafa Gálvez, D. Preuveneers, B. Preneel\",\"doi\":\"10.1109/SPW59333.2023.00009\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Neural networks have been shown to be vulnerable to visual data perturbations imperceptible to the human eye. Nowadays, the leading hypothesis about the reason for the existence of these adversarial examples is the presence of non-robust features, which are highly predictive but brittle. Also, it has been shown that there exist two types of non-robust features depending on whether or not they are entangled with robust features; perturbing non-robust features entangled with robust features can form adversarial examples. This paper extends earlier work by showing that models trained exclusively on robust features are still vulnerable to one type of adversarial example. Standard-trained networks can classify more accurately than robustly trained networks in this situation. Our experiments show that this phenomenon is due to the high correlation between most of the robust features and both correct and incorrect labels. In this work, we define features highly correlated with correct and incorrect labels as illusionary robust features. We discuss how perturbing an image attacking robust models affects the feature space. Based on our observations on the feature space, we explain why standard models are more successful in correctly classifying these perturbed images than robustly trained models. Our observations also show that, similar to changing non-robust features, changing some of the robust features is still imperceptible to the human eye.\",\"PeriodicalId\":308378,\"journal\":{\"name\":\"2023 IEEE Security and Privacy Workshops (SPW)\",\"volume\":\"18 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2023 IEEE Security and Privacy Workshops (SPW)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SPW59333.2023.00009\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE Security and Privacy Workshops (SPW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SPW59333.2023.00009","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

神经网络已被证明易受人眼无法察觉的视觉数据扰动的影响。目前，关于这些对抗性示例存在的原因的主要假设是存在非鲁棒特征，这些特征具有高度预测性但很脆弱。此外，还表明存在两种类型的非鲁棒特征，这取决于它们是否与鲁棒特征纠缠;扰动的非鲁棒特征与鲁棒特征纠缠在一起可以形成对抗的例子。本文扩展了早期的工作，表明仅在鲁棒特征上训练的模型仍然容易受到一种类型的对抗性示例的影响。在这种情况下，标准训练的网络可以比鲁棒训练的网络更准确地进行分类。我们的实验表明，这种现象是由于大多数鲁棒特征与正确和错误标签之间的高度相关性。在这项工作中，我们将与正确和错误标签高度相关的特征定义为虚幻的鲁棒特征。我们讨论如何干扰攻击鲁棒模型的图像影响特征空间。基于我们对特征空间的观察，我们解释了为什么标准模型在正确分类这些扰动图像方面比鲁棒训练的模型更成功。我们的观察还表明，与改变非鲁棒特征类似，改变一些鲁棒特征仍然是人眼无法察觉的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

On the Brittleness of Robust Features: An Exploratory Analysis of Model Robustness and Illusionary Robust Features

Neural networks have been shown to be vulnerable to visual data perturbations imperceptible to the human eye. Nowadays, the leading hypothesis about the reason for the existence of these adversarial examples is the presence of non-robust features, which are highly predictive but brittle. Also, it has been shown that there exist two types of non-robust features depending on whether or not they are entangled with robust features; perturbing non-robust features entangled with robust features can form adversarial examples. This paper extends earlier work by showing that models trained exclusively on robust features are still vulnerable to one type of adversarial example. Standard-trained networks can classify more accurately than robustly trained networks in this situation. Our experiments show that this phenomenon is due to the high correlation between most of the robust features and both correct and incorrect labels. In this work, we define features highly correlated with correct and incorrect labels as illusionary robust features. We discuss how perturbing an image attacking robust models affects the feature space. Based on our observations on the feature space, we explain why standard models are more successful in correctly classifying these perturbed images than robustly trained models. Our observations also show that, similar to changing non-robust features, changing some of the robust features is still imperceptible to the human eye.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2023 IEEE Security and Privacy Workshops (SPW)

自引率

0.00%

发文量