Hiding Needles in a Haystack: Towards Constructing Neural Networks that Evade Verification

Proceedings of the 2022 ACM Workshop on Information Hiding and Multimedia Security Pub Date : 2022-06-23 DOI:10.1145/3531536.3532966

Árpád Berta, Gábor Danner, István Hegedüs, Márk Jelasity

{"title":"Hiding Needles in a Haystack: Towards Constructing Neural Networks that Evade Verification","authors":"Árpád Berta, Gábor Danner, István Hegedüs, Márk Jelasity","doi":"10.1145/3531536.3532966","DOIUrl":null,"url":null,"abstract":"Machine learning models are vulnerable to adversarial attacks, where a small, invisible, malicious perturbation of the input changes the predicted label. A large area of research is concerned with verification techniques that attempt to decide whether a given model has adversarial inputs close to a given benign input. Here, we show that current approaches to verification have a key vulnerability: we construct a model that is not robust but passes current verifiers. The idea is to insert artificial adversarial perturbations by adding a backdoor to a robust neural network model. In our construction, the adversarial input subspace that triggers the backdoor has a very small volume, and outside this subspace the gradient of the model is identical to that of the clean model. In other words, we seek to create a \"needle in a haystack\" search problem. For practical purposes, we also require that the adversarial samples be robust to JPEG compression. Large \"needle in the haystack\" problems are practically impossible to solve with any search algorithm. Formal verifiers can handle this in principle, but they do not scale up to real-world networks at the moment, and achieving this is a challenge because the verification problem is NP-complete. Our construction is based on training a hiding and a revealing network using deep steganography. Using the revealing network, we create a separate backdoor network and integrate it into the target network. We train our deep steganography networks over the CIFAR-10 dataset. We then evaluate our construction using state-of-the-art adversarial attacks and backdoor detectors over the CIFAR-10 and the ImageNet datasets. We made the code and models publicly available at https://github.com/szegedai/hiding-needles-in-a-haystack.","PeriodicalId":164949,"journal":{"name":"Proceedings of the 2022 ACM Workshop on Information Hiding and Multimedia Security","volume":"223 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2022 ACM Workshop on Information Hiding and Multimedia Security","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3531536.3532966","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

Machine learning models are vulnerable to adversarial attacks, where a small, invisible, malicious perturbation of the input changes the predicted label. A large area of research is concerned with verification techniques that attempt to decide whether a given model has adversarial inputs close to a given benign input. Here, we show that current approaches to verification have a key vulnerability: we construct a model that is not robust but passes current verifiers. The idea is to insert artificial adversarial perturbations by adding a backdoor to a robust neural network model. In our construction, the adversarial input subspace that triggers the backdoor has a very small volume, and outside this subspace the gradient of the model is identical to that of the clean model. In other words, we seek to create a "needle in a haystack" search problem. For practical purposes, we also require that the adversarial samples be robust to JPEG compression. Large "needle in the haystack" problems are practically impossible to solve with any search algorithm. Formal verifiers can handle this in principle, but they do not scale up to real-world networks at the moment, and achieving this is a challenge because the verification problem is NP-complete. Our construction is based on training a hiding and a revealing network using deep steganography. Using the revealing network, we create a separate backdoor network and integrate it into the target network. We train our deep steganography networks over the CIFAR-10 dataset. We then evaluate our construction using state-of-the-art adversarial attacks and backdoor detectors over the CIFAR-10 and the ImageNet datasets. We made the code and models publicly available at https://github.com/szegedai/hiding-needles-in-a-haystack.

查看原文本刊更多论文

大海捞针:构建逃避验证的神经网络

机器学习模型容易受到对抗性攻击，在这种攻击中，输入的一个小的、看不见的、恶意的扰动会改变预测的标签。一个很大的研究领域是与验证技术有关的，这些技术试图确定给定模型是否具有接近给定良性输入的敌对输入。在这里，我们展示了当前的验证方法有一个关键的弱点:我们构建了一个不健壮但通过当前验证者的模型。这个想法是通过在鲁棒神经网络模型中添加后门来插入人工的对抗性扰动。在我们的构造中，触发后门的对抗性输入子空间具有非常小的体积，并且在该子空间之外，模型的梯度与干净模型的梯度相同。换句话说，我们试图创造一个“大海捞针”的搜索问题。出于实际目的，我们还要求对抗性样本对JPEG压缩具有鲁棒性。大的“大海捞针”问题实际上是不可能用任何搜索算法解决的。正式的验证器原则上可以处理这个问题，但目前它们不能扩展到现实世界的网络，实现这一点是一个挑战，因为验证问题是np完备的。我们的构建是基于使用深度隐写术训练隐藏和显示网络。利用揭露网络，我们创建一个单独的后门网络，并将其整合到目标网络中。我们在CIFAR-10数据集上训练我们的深度隐写网络。然后，我们在CIFAR-10和ImageNet数据集上使用最先进的对抗性攻击和后门检测器来评估我们的构建。我们在https://github.com/szegedai/hiding-needles-in-a-haystack上公开了代码和模型。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 2022 ACM Workshop on Information Hiding and Multimedia Security

自引率

0.00%

发文量