{"title":"Defending Against Model Inversion Attack via Feature Purification","authors":"Shenhao Shi;Yan Wo","doi":"10.1109/TIFS.2025.3565997","DOIUrl":null,"url":null,"abstract":"The Model Inversion Attack (MIA) aims to reconstruct the privacy data used to train the target model, raising significant public concerns about the privacy of machine learning models. Therefore, proposing effective methods to defend against MIA has become crucial. The relationship between MIA and defense is a typical adversarial process. If the upper bound of the attacker’s capability can be estimated through theoretical analysis, a more robust defense method can be achieved by weakening this upper bound. To achieve this goal, we simplify MIA to a problem of reconstructing estimates, and analyze the lower bound of the reconstruction error obtained by the attacker, from which we infer the theoretical upper bound of the attacker’s capability, providing a foundation for designing the defense mechanism. We find that the lower bound of reconstruction error is inversely proportional to the Fisher information. This means that smaller Fisher information can lead to a larger reconstruction error. If the attacker cannot obtain second-order information during the reconstruction estimation, the corresponding Fisher information will be reduced. Consequently, we propose a defense against model inversion attacks via feature purification (DMIAFP). To reduce the Fisher information, DMIAFP hides the private data contained within the features and its second-order information (the relationships between private data) by minimizing the first-order and second-order correlations between private data and output features. Additionally, we introduce Principal Inertia Components (PIC) for the correlation metric, and infer the theoretical upper bound of the attacker’s reconstruction ability through PIC, thereby avoiding the issue of poor defensive performance caused by data-driven instability in defense methods that train by adversarially inverse models. Experimental results show that our method achieves good performance in defense and exhibits significant advantages in removing redundant information contained in features.","PeriodicalId":13492,"journal":{"name":"IEEE Transactions on Information Forensics and Security","volume":"20 ","pages":"4755-4768"},"PeriodicalIF":6.3000,"publicationDate":"2025-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Information Forensics and Security","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10981319/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}
引用次数: 0
Abstract
The Model Inversion Attack (MIA) aims to reconstruct the privacy data used to train the target model, raising significant public concerns about the privacy of machine learning models. Therefore, proposing effective methods to defend against MIA has become crucial. The relationship between MIA and defense is a typical adversarial process. If the upper bound of the attacker’s capability can be estimated through theoretical analysis, a more robust defense method can be achieved by weakening this upper bound. To achieve this goal, we simplify MIA to a problem of reconstructing estimates, and analyze the lower bound of the reconstruction error obtained by the attacker, from which we infer the theoretical upper bound of the attacker’s capability, providing a foundation for designing the defense mechanism. We find that the lower bound of reconstruction error is inversely proportional to the Fisher information. This means that smaller Fisher information can lead to a larger reconstruction error. If the attacker cannot obtain second-order information during the reconstruction estimation, the corresponding Fisher information will be reduced. Consequently, we propose a defense against model inversion attacks via feature purification (DMIAFP). To reduce the Fisher information, DMIAFP hides the private data contained within the features and its second-order information (the relationships between private data) by minimizing the first-order and second-order correlations between private data and output features. Additionally, we introduce Principal Inertia Components (PIC) for the correlation metric, and infer the theoretical upper bound of the attacker’s reconstruction ability through PIC, thereby avoiding the issue of poor defensive performance caused by data-driven instability in defense methods that train by adversarially inverse models. Experimental results show that our method achieves good performance in defense and exhibits significant advantages in removing redundant information contained in features.
期刊介绍:
The IEEE Transactions on Information Forensics and Security covers the sciences, technologies, and applications relating to information forensics, information security, biometrics, surveillance and systems applications that incorporate these features