Defending Against Model Inversion Attack via Feature Purification

IF 8 1区计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS

IEEE Transactions on Information Forensics and Security Pub Date : 2025-04-30 DOI:10.1109/TIFS.2025.3565997

Shenhao Shi;Yan Wo

{"title":"Defending Against Model Inversion Attack via Feature Purification","authors":"Shenhao Shi;Yan Wo","doi":"10.1109/TIFS.2025.3565997","DOIUrl":null,"url":null,"abstract":"The Model Inversion Attack (MIA) aims to reconstruct the privacy data used to train the target model, raising significant public concerns about the privacy of machine learning models. Therefore, proposing effective methods to defend against MIA has become crucial. The relationship between MIA and defense is a typical adversarial process. If the upper bound of the attacker’s capability can be estimated through theoretical analysis, a more robust defense method can be achieved by weakening this upper bound. To achieve this goal, we simplify MIA to a problem of reconstructing estimates, and analyze the lower bound of the reconstruction error obtained by the attacker, from which we infer the theoretical upper bound of the attacker’s capability, providing a foundation for designing the defense mechanism. We find that the lower bound of reconstruction error is inversely proportional to the Fisher information. This means that smaller Fisher information can lead to a larger reconstruction error. If the attacker cannot obtain second-order information during the reconstruction estimation, the corresponding Fisher information will be reduced. Consequently, we propose a defense against model inversion attacks via feature purification (DMIAFP). To reduce the Fisher information, DMIAFP hides the private data contained within the features and its second-order information (the relationships between private data) by minimizing the first-order and second-order correlations between private data and output features. Additionally, we introduce Principal Inertia Components (PIC) for the correlation metric, and infer the theoretical upper bound of the attacker’s reconstruction ability through PIC, thereby avoiding the issue of poor defensive performance caused by data-driven instability in defense methods that train by adversarially inverse models. Experimental results show that our method achieves good performance in defense and exhibits significant advantages in removing redundant information contained in features.","PeriodicalId":13492,"journal":{"name":"IEEE Transactions on Information Forensics and Security","volume":"20 ","pages":"4755-4768"},"PeriodicalIF":8.0000,"publicationDate":"2025-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Information Forensics and Security","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10981319/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}

引用次数: 0

Abstract

The Model Inversion Attack (MIA) aims to reconstruct the privacy data used to train the target model, raising significant public concerns about the privacy of machine learning models. Therefore, proposing effective methods to defend against MIA has become crucial. The relationship between MIA and defense is a typical adversarial process. If the upper bound of the attacker’s capability can be estimated through theoretical analysis, a more robust defense method can be achieved by weakening this upper bound. To achieve this goal, we simplify MIA to a problem of reconstructing estimates, and analyze the lower bound of the reconstruction error obtained by the attacker, from which we infer the theoretical upper bound of the attacker’s capability, providing a foundation for designing the defense mechanism. We find that the lower bound of reconstruction error is inversely proportional to the Fisher information. This means that smaller Fisher information can lead to a larger reconstruction error. If the attacker cannot obtain second-order information during the reconstruction estimation, the corresponding Fisher information will be reduced. Consequently, we propose a defense against model inversion attacks via feature purification (DMIAFP). To reduce the Fisher information, DMIAFP hides the private data contained within the features and its second-order information (the relationships between private data) by minimizing the first-order and second-order correlations between private data and output features. Additionally, we introduce Principal Inertia Components (PIC) for the correlation metric, and infer the theoretical upper bound of the attacker’s reconstruction ability through PIC, thereby avoiding the issue of poor defensive performance caused by data-driven instability in defense methods that train by adversarially inverse models. Experimental results show that our method achieves good performance in defense and exhibits significant advantages in removing redundant information contained in features.

查看原文本刊更多论文

基于特征净化的模型反转攻击防御

模型反转攻击（MIA）旨在重建用于训练目标模型的隐私数据，引起公众对机器学习模型隐私的重大关注。因此，提出有效的MIA防御方法变得至关重要。MIA与防御之间的关系是一个典型的对抗过程。如果通过理论分析可以估计出攻击者能力的上界，则可以通过弱化该上界来获得更鲁棒的防御方法。为了实现这一目标，我们将MIA简化为重建估计问题，并分析攻击者获得的重建误差的下界，由此推断攻击者能力的理论上界，为设计防御机制提供基础。我们发现重构误差的下界与Fisher信息成反比。这意味着较小的Fisher信息可能导致较大的重建误差。如果攻击者在重构估计中无法获得二阶信息，则相应的Fisher信息将被减少。因此，我们提出了一种通过特征净化（DMIAFP）来防御模型反转攻击的方法。为了减少Fisher信息，DMIAFP通过最小化私有数据和输出特征之间的一阶和二阶相关性来隐藏特征中包含的私有数据及其二阶信息（私有数据之间的关系）。此外，我们引入了主惯性分量（Principal Inertia Components， PIC）作为相关度量，并通过PIC推断出攻击者重构能力的理论上限，从而避免了使用对抗逆模型训练的防御方法由于数据驱动的不稳定性而导致防御性能不佳的问题。实验结果表明，该方法具有良好的防御性能，在去除特征中包含的冗余信息方面具有明显的优势。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Information Forensics and Security 工程技术-工程：电子与电气

CiteScore

14.40

自引率

7.40%

发文量

234

审稿时长

6.5 months

期刊介绍： The IEEE Transactions on Information Forensics and Security covers the sciences, technologies, and applications relating to information forensics, information security, biometrics, surveillance and systems applications that incorporate these features