利用数据中毒防御神经网络模型反演攻击

IF 8.9 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE transactions on neural networks and learning systems Pub Date : 2025-04-09 DOI:10.1109/TNNLS.2025.3554656

Shuai Zhou;Dayong Ye;Tianqing Zhu;Wanlei Zhou

{"title":"利用数据中毒防御神经网络模型反演攻击","authors":"Shuai Zhou;Dayong Ye;Tianqing Zhu;Wanlei Zhou","doi":"10.1109/TNNLS.2025.3554656","DOIUrl":null,"url":null,"abstract":"Model inversion attacks pose a significant privacy threat to machine learning models by reconstructing sensitive data from their outputs. While various defenses have been proposed to counteract these attacks, they often come at the cost of the classifier’s utility, thus creating a challenging trade-off between privacy protection and model utility. Moreover, most existing defenses require retraining the classifier for enhanced robustness, which is impractical for large-scale, well-established models. This article introduces a novel defense mechanism to better balance privacy and utility, particularly against adversaries who employ a machine learning model (i.e., inversion model) to reconstruct private data. Drawing inspiration from data poisoning attacks, which can compromise the performance of machine learning models, we propose a strategy that leverages data poisoning to contaminate the training data of inversion models, thereby preventing model inversion attacks. Two defense methods are presented. The first, termed <bold>label-<bold>preserving poisoning attacks for <bold>all output vectors (LPA), involves subtle perturbations to all output vectors while preserving their labels. Our findings demonstrate that these minor perturbations, introduced through a data poisoning approach, significantly increase the difficulty of data reconstruction without compromising the utility of the classifier. Subsequently, we introduce a second method, <bold>label-<bold>flipping poisoning for <bold>partial output vectors (LFP), which selectively perturbs a small subset of output vectors and alters their labels during the process. Empirical results indicate that LPA is notably effective, outperforming the current state-of-the-art defenses. Our data poisoning-based defense provides a new retraining-free defense paradigm that preserves the victim classifier’s utility.","PeriodicalId":13303,"journal":{"name":"IEEE transactions on neural networks and learning systems","volume":"36 9","pages":"16324-16338"},"PeriodicalIF":8.9000,"publicationDate":"2025-04-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Defending Against Neural Network Model Inversion Attacks via Data Poisoning\",\"authors\":\"Shuai Zhou;Dayong Ye;Tianqing Zhu;Wanlei Zhou\",\"doi\":\"10.1109/TNNLS.2025.3554656\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Model inversion attacks pose a significant privacy threat to machine learning models by reconstructing sensitive data from their outputs. While various defenses have been proposed to counteract these attacks, they often come at the cost of the classifier’s utility, thus creating a challenging trade-off between privacy protection and model utility. Moreover, most existing defenses require retraining the classifier for enhanced robustness, which is impractical for large-scale, well-established models. This article introduces a novel defense mechanism to better balance privacy and utility, particularly against adversaries who employ a machine learning model (i.e., inversion model) to reconstruct private data. Drawing inspiration from data poisoning attacks, which can compromise the performance of machine learning models, we propose a strategy that leverages data poisoning to contaminate the training data of inversion models, thereby preventing model inversion attacks. Two defense methods are presented. The first, termed <bold>label-<bold>preserving poisoning attacks for <bold>all output vectors (LPA), involves subtle perturbations to all output vectors while preserving their labels. Our findings demonstrate that these minor perturbations, introduced through a data poisoning approach, significantly increase the difficulty of data reconstruction without compromising the utility of the classifier. Subsequently, we introduce a second method, <bold>label-<bold>flipping poisoning for <bold>partial output vectors (LFP), which selectively perturbs a small subset of output vectors and alters their labels during the process. Empirical results indicate that LPA is notably effective, outperforming the current state-of-the-art defenses. Our data poisoning-based defense provides a new retraining-free defense paradigm that preserves the victim classifier’s utility.\",\"PeriodicalId\":13303,\"journal\":{\"name\":\"IEEE transactions on neural networks and learning systems\",\"volume\":\"36 9\",\"pages\":\"16324-16338\"},\"PeriodicalIF\":8.9000,\"publicationDate\":\"2025-04-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE transactions on neural networks and learning systems\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10959728/\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on neural networks and learning systems","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10959728/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

模型反转攻击通过从机器学习模型的输出中重建敏感数据，对机器学习模型构成了重大的隐私威胁。虽然已经提出了各种防御措施来抵消这些攻击，但它们通常以分类器的效用为代价，因此在隐私保护和模型效用之间产生了一个具有挑战性的权衡。此外，大多数现有的防御需要重新训练分类器以增强鲁棒性，这对于大规模的、成熟的模型是不切实际的。本文介绍了一种新的防御机制，以更好地平衡隐私和效用，特别是针对使用机器学习模型（即反转模型）重建私有数据的对手。数据中毒攻击会损害机器学习模型的性能，我们从数据中毒攻击中得到启发，提出了一种利用数据中毒来污染反演模型的训练数据，从而防止模型反演攻击的策略。提出了两种防御方法。第一种，被称为所有输出向量的标签保留中毒攻击（LPA），涉及对所有输出向量的微妙扰动，同时保留它们的标签。我们的研究结果表明，通过数据中毒方法引入的这些微小扰动显着增加了数据重建的难度，而不会影响分类器的实用性。随后，我们引入了第二种方法，部分输出向量的标签翻转中毒（LFP），该方法选择性地扰动一小部分输出向量并在此过程中改变其标签。实证结果表明，LPA显著有效，优于当前最先进的防御。我们基于数据中毒的防御提供了一种新的无需再训练的防御范例，保留了受害者分类器的效用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Defending Against Neural Network Model Inversion Attacks via Data Poisoning

Model inversion attacks pose a significant privacy threat to machine learning models by reconstructing sensitive data from their outputs. While various defenses have been proposed to counteract these attacks, they often come at the cost of the classifier’s utility, thus creating a challenging trade-off between privacy protection and model utility. Moreover, most existing defenses require retraining the classifier for enhanced robustness, which is impractical for large-scale, well-established models. This article introduces a novel defense mechanism to better balance privacy and utility, particularly against adversaries who employ a machine learning model (i.e., inversion model) to reconstruct private data. Drawing inspiration from data poisoning attacks, which can compromise the performance of machine learning models, we propose a strategy that leverages data poisoning to contaminate the training data of inversion models, thereby preventing model inversion attacks. Two defense methods are presented. The first, termed label-preserving poisoning attacks for all output vectors (LPA), involves subtle perturbations to all output vectors while preserving their labels. Our findings demonstrate that these minor perturbations, introduced through a data poisoning approach, significantly increase the difficulty of data reconstruction without compromising the utility of the classifier. Subsequently, we introduce a second method, label-flipping poisoning for partial output vectors (LFP), which selectively perturbs a small subset of output vectors and alters their labels during the process. Empirical results indicate that LPA is notably effective, outperforming the current state-of-the-art defenses. Our data poisoning-based defense provides a new retraining-free defense paradigm that preserves the victim classifier’s utility.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE transactions on neural networks and learning systems COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE-COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

CiteScore

23.80

自引率

9.60%

发文量

2102

审稿时长

3-8 weeks

期刊介绍： The focus of IEEE Transactions on Neural Networks and Learning Systems is to present scholarly articles discussing the theory, design, and applications of neural networks as well as other learning systems. The journal primarily highlights technical and scientific research in this domain.