联邦学习中私有训练数据的提取

IF 6.3 1区计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS

IEEE Transactions on Information Forensics and Security Pub Date : 2025-04-08 DOI:10.1109/TIFS.2025.3558581

Jiaheng Wei;Yanjun Zhang;Leo Yu Zhang;Chao Chen;Shirui Pan;Kok-Leong Ong;Jun Zhang;Yang Xiang

{"title":"联邦学习中私有训练数据的提取","authors":"Jiaheng Wei;Yanjun Zhang;Leo Yu Zhang;Chao Chen;Shirui Pan;Kok-Leong Ong;Jun Zhang;Yang Xiang","doi":"10.1109/TIFS.2025.3558581","DOIUrl":null,"url":null,"abstract":"The utilization of machine learning algorithms in distributed web applications is experiencing significant growth. One notable approach is Federated Learning (FL) Recent research has brought attention to the vulnerability of FL to gradient inversion attacks, which seek to reconstruct the original training samples, posing a substantial threat to client privacy. Most existing gradient inversion attacks, however, require control over the central server and rely on substantial prior knowledge, including information about batch normalization and data distribution. In this study, we introduce Poisoning Gradient Leakage from Client (PGLC), a novel attack method that operates from the clients’ side. For the first time, we demonstrate the feasibility of a client-side adversary with limited knowledge successfully recovering training samples from the aggregated global model. Our approach enables the adversary to employ a malicious model that increases the loss of a specific targeted class of interest. When honest clients employ the poisoned global model, the gradients of samples become distinct in the aggregated update. This allows the adversary to effectively reconstruct private inputs from other clients using the aggregated update. Furthermore, our <sc>PGLC</small> attack exhibits stealthiness against Byzantine-robust aggregation rules (AGRs). Through the optimization of malicious updates and the blending of benign updates with a malicious replacement vector, our method remains undetected by these defense mechanisms. We conducted experiments across various benchmark datasets, considering representative Byzantine-robust AGRs and exploring different FL settings with varying levels of adversary knowledge about the data. Our results consistently demonstrate the ability of <sc>PGLC</small> to extract training data in all tested scenarios.","PeriodicalId":13492,"journal":{"name":"IEEE Transactions on Information Forensics and Security","volume":"20 ","pages":"4525-4540"},"PeriodicalIF":6.3000,"publicationDate":"2025-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Extracting Private Training Data in Federated Learning From Clients\",\"authors\":\"Jiaheng Wei;Yanjun Zhang;Leo Yu Zhang;Chao Chen;Shirui Pan;Kok-Leong Ong;Jun Zhang;Yang Xiang\",\"doi\":\"10.1109/TIFS.2025.3558581\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The utilization of machine learning algorithms in distributed web applications is experiencing significant growth. One notable approach is Federated Learning (FL) Recent research has brought attention to the vulnerability of FL to gradient inversion attacks, which seek to reconstruct the original training samples, posing a substantial threat to client privacy. Most existing gradient inversion attacks, however, require control over the central server and rely on substantial prior knowledge, including information about batch normalization and data distribution. In this study, we introduce Poisoning Gradient Leakage from Client (PGLC), a novel attack method that operates from the clients’ side. For the first time, we demonstrate the feasibility of a client-side adversary with limited knowledge successfully recovering training samples from the aggregated global model. Our approach enables the adversary to employ a malicious model that increases the loss of a specific targeted class of interest. When honest clients employ the poisoned global model, the gradients of samples become distinct in the aggregated update. This allows the adversary to effectively reconstruct private inputs from other clients using the aggregated update. Furthermore, our <sc>PGLC</small> attack exhibits stealthiness against Byzantine-robust aggregation rules (AGRs). Through the optimization of malicious updates and the blending of benign updates with a malicious replacement vector, our method remains undetected by these defense mechanisms. We conducted experiments across various benchmark datasets, considering representative Byzantine-robust AGRs and exploring different FL settings with varying levels of adversary knowledge about the data. Our results consistently demonstrate the ability of <sc>PGLC</small> to extract training data in all tested scenarios.\",\"PeriodicalId\":13492,\"journal\":{\"name\":\"IEEE Transactions on Information Forensics and Security\",\"volume\":\"20 \",\"pages\":\"4525-4540\"},\"PeriodicalIF\":6.3000,\"publicationDate\":\"2025-04-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Information Forensics and Security\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10955239/\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, THEORY & METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Information Forensics and Security","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10955239/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}

引用次数: 0

摘要

机器学习算法在分布式web应用程序中的应用正经历着显著的增长。一种值得注意的方法是联邦学习（FL）。最近的研究引起了人们对FL在梯度反转攻击中的脆弱性的关注，这种攻击试图重建原始训练样本，对客户隐私构成实质性威胁。然而，大多数现有的梯度反转攻击需要对中央服务器进行控制，并依赖于大量的先验知识，包括批处理规范化和数据分布的信息。在本研究中，我们介绍了一种从客户端操作的新型攻击方法——客户端中毒梯度泄漏（PGLC）。我们首次证明了客户端对手利用有限的知识成功地从聚合的全局模型中恢复训练样本的可行性。我们的方法使攻击者能够使用恶意模型，从而增加特定目标类的损失。当诚实的客户端使用有毒的全局模型时，样本的梯度在聚合更新中变得明显。这允许攻击者使用聚合更新有效地重构来自其他客户机的私有输入。此外，我们的PGLC攻击对拜占庭鲁棒聚合规则（agr）具有隐蔽性。通过优化恶意更新和将良性更新与恶意替换向量混合，我们的方法仍然未被这些防御机制检测到。我们在各种基准数据集上进行了实验，考虑了具有代表性的拜占庭鲁棒agr，并探索了具有不同对手对数据了解程度的不同FL设置。我们的结果一致地证明了PGLC在所有测试场景中提取训练数据的能力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Extracting Private Training Data in Federated Learning From Clients

The utilization of machine learning algorithms in distributed web applications is experiencing significant growth. One notable approach is Federated Learning (FL) Recent research has brought attention to the vulnerability of FL to gradient inversion attacks, which seek to reconstruct the original training samples, posing a substantial threat to client privacy. Most existing gradient inversion attacks, however, require control over the central server and rely on substantial prior knowledge, including information about batch normalization and data distribution. In this study, we introduce Poisoning Gradient Leakage from Client (PGLC), a novel attack method that operates from the clients’ side. For the first time, we demonstrate the feasibility of a client-side adversary with limited knowledge successfully recovering training samples from the aggregated global model. Our approach enables the adversary to employ a malicious model that increases the loss of a specific targeted class of interest. When honest clients employ the poisoned global model, the gradients of samples become distinct in the aggregated update. This allows the adversary to effectively reconstruct private inputs from other clients using the aggregated update. Furthermore, our PGLC attack exhibits stealthiness against Byzantine-robust aggregation rules (AGRs). Through the optimization of malicious updates and the blending of benign updates with a malicious replacement vector, our method remains undetected by these defense mechanisms. We conducted experiments across various benchmark datasets, considering representative Byzantine-robust AGRs and exploring different FL settings with varying levels of adversary knowledge about the data. Our results consistently demonstrate the ability of PGLC to extract training data in all tested scenarios.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Transactions on Information Forensics and Security 工程技术-工程：电子与电气

CiteScore

14.40

自引率

7.40%

发文量

234

审稿时长

6.5 months

期刊介绍： The IEEE Transactions on Information Forensics and Security covers the sciences, technologies, and applications relating to information forensics, information security, biometrics, surveillance and systems applications that incorporate these features