Defending Against Model Inversion Attack by Adversarial Examples

2021 IEEE International Conference on Cyber Security and Resilience (CSR) Pub Date : 2021-07-26 DOI:10.1109/CSR51186.2021.9527945

Jing Wen, S. Yiu, L. Hui

{"title":"Defending Against Model Inversion Attack by Adversarial Examples","authors":"Jing Wen, S. Yiu, L. Hui","doi":"10.1109/CSR51186.2021.9527945","DOIUrl":null,"url":null,"abstract":"Model inversion (MI) attacks aim to infer and reconstruct the input data from the output of a neural network, which poses a severe threat to the privacy of input data. Inspired by adversarial examples, we propose defending against MI attacks by adding adversarial noise to the output. The critical challenge is finding a noise vector that maximizes the inversion error and introduces negligible utility loss to the target model. We propose an algorithm to craft such noise vectors, which also incorporates utility-loss constraints. Specifically, our algorithm takes advantage of the gradient of an inversion model we train to mimic the adversary and compute a noise vector to turn the output into an adversarial example that can maximize the reconstruction error of the inversion model. Then we apply a label modifier that keeps the label unchanged to achieve zero accuracy loss of the target model. Our defense does not tamper with the training process or need the private training dataset. Thus it can be easily applied to any current neural networks or APIs. We evaluate our method under both standard and adaptive attack settings. Our empirical results show our approach is effective against state-of-the-art MI attacks due to the transferability of adversarial examples and outperforms existing defenses. Furthermore, it causes more reconstruction errors while introducing zero accuracy loss and less distortion than existing defenses.","PeriodicalId":253300,"journal":{"name":"2021 IEEE International Conference on Cyber Security and Resilience (CSR)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE International Conference on Cyber Security and Resilience (CSR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CSR51186.2021.9527945","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 5

Abstract

Model inversion (MI) attacks aim to infer and reconstruct the input data from the output of a neural network, which poses a severe threat to the privacy of input data. Inspired by adversarial examples, we propose defending against MI attacks by adding adversarial noise to the output. The critical challenge is finding a noise vector that maximizes the inversion error and introduces negligible utility loss to the target model. We propose an algorithm to craft such noise vectors, which also incorporates utility-loss constraints. Specifically, our algorithm takes advantage of the gradient of an inversion model we train to mimic the adversary and compute a noise vector to turn the output into an adversarial example that can maximize the reconstruction error of the inversion model. Then we apply a label modifier that keeps the label unchanged to achieve zero accuracy loss of the target model. Our defense does not tamper with the training process or need the private training dataset. Thus it can be easily applied to any current neural networks or APIs. We evaluate our method under both standard and adaptive attack settings. Our empirical results show our approach is effective against state-of-the-art MI attacks due to the transferability of adversarial examples and outperforms existing defenses. Furthermore, it causes more reconstruction errors while introducing zero accuracy loss and less distortion than existing defenses.

查看原文本刊更多论文

用对抗性示例防御模型反转攻击

模型反演(MI)攻击的目的是从神经网络的输出中推断和重构输入数据，这对输入数据的隐私性构成了严重的威胁。受对抗性示例的启发，我们提出通过在输出中添加对抗性噪声来防御MI攻击。关键的挑战是找到一个噪声向量，使反演误差最大化，并为目标模型引入可忽略不计的效用损失。我们提出了一种算法来制作这样的噪声向量，它也包含了效用损耗约束。具体来说，我们的算法利用了我们训练的模拟对手的反演模型的梯度，并计算噪声向量，将输出转换为可以最大化反演模型重建误差的对抗性示例。然后，我们应用标签修饰符，使标签保持不变，以实现目标模型的零精度损失。我们的防御不篡改训练过程，也不需要私人训练数据集。因此，它可以很容易地应用于任何当前的神经网络或api。我们在标准和自适应攻击设置下评估了我们的方法。我们的实证结果表明，由于对抗性示例的可转移性，我们的方法对最先进的MI攻击是有效的，并且优于现有的防御。此外，与现有的防御系统相比，它在引入零精度损失和更小失真的情况下导致了更多的重构误差。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2021 IEEE International Conference on Cyber Security and Resilience (CSR)

自引率

0.00%

发文量