Data distribution inference attack in federated learning via reinforcement learning support

IF 3 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

High-Confidence Computing Pub Date : 2024-05-17 DOI:10.1016/j.hcc.2024.100235

Dongxiao Yu , Hengming Zhang , Yan Huang , Zhenzhen Xie

{"title":"Data distribution inference attack in federated learning via reinforcement learning support","authors":"Dongxiao Yu , Hengming Zhang , Yan Huang , Zhenzhen Xie","doi":"10.1016/j.hcc.2024.100235","DOIUrl":null,"url":null,"abstract":"<div><div>Federated Learning (FL) is currently a widely used collaborative learning framework, and the distinguished feature of FL is that the clients involved in training do not need to share raw data, but only transfer the model parameters to share knowledge, and finally get a global model with improved performance. However, recent studies have found that sharing model parameters may still lead to privacy leakage. From the shared model parameters, local training data can be reconstructed and thus lead to a threat to individual privacy and security. We observed that most of the current attacks are aimed at client-specific data reconstruction, while limited attention is paid to the information leakage of the global model. In our work, we propose a novel FL attack based on shared model parameters that can deduce the data distribution of the global model. Different from other FL attacks that aim to infer individual clients’ raw data, the data distribution inference attack proposed in this work shows that the attackers can have the capability to deduce the data distribution information behind the global model. We argue that such information is valuable since the training data behind a well-trained global model indicates the common knowledge of a specific task, such as social networks and e-commerce applications. To implement such an attack, our key idea is to adopt a deep reinforcement learning approach to guide the attack process, where the RL agent adjusts the pseudo-data distribution automatically until it is similar to the ground truth data distribution. By a carefully designed Markov decision proces (MDP) process, our implementation ensures our attack can have stable performance and experimental results verify the effectiveness of our proposed inference attack.</div></div>","PeriodicalId":100605,"journal":{"name":"High-Confidence Computing","volume":"5 1","pages":"Article 100235"},"PeriodicalIF":3.0000,"publicationDate":"2024-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"High-Confidence Computing","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2667295224000382","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Federated Learning (FL) is currently a widely used collaborative learning framework, and the distinguished feature of FL is that the clients involved in training do not need to share raw data, but only transfer the model parameters to share knowledge, and finally get a global model with improved performance. However, recent studies have found that sharing model parameters may still lead to privacy leakage. From the shared model parameters, local training data can be reconstructed and thus lead to a threat to individual privacy and security. We observed that most of the current attacks are aimed at client-specific data reconstruction, while limited attention is paid to the information leakage of the global model. In our work, we propose a novel FL attack based on shared model parameters that can deduce the data distribution of the global model. Different from other FL attacks that aim to infer individual clients’ raw data, the data distribution inference attack proposed in this work shows that the attackers can have the capability to deduce the data distribution information behind the global model. We argue that such information is valuable since the training data behind a well-trained global model indicates the common knowledge of a specific task, such as social networks and e-commerce applications. To implement such an attack, our key idea is to adopt a deep reinforcement learning approach to guide the attack process, where the RL agent adjusts the pseudo-data distribution automatically until it is similar to the ground truth data distribution. By a carefully designed Markov decision proces (MDP) process, our implementation ensures our attack can have stable performance and experimental results verify the effectiveness of our proposed inference attack.

查看原文本刊更多论文

通过强化学习支持联合学习中的数据分布推理攻击

联邦学习（Federated Learning， FL）是目前应用比较广泛的一种协作学习框架，其显著的特点是参与训练的客户端不需要共享原始数据，只需要通过传递模型参数来共享知识，最终得到一个性能提高的全局模型。然而，最近的研究发现，共享模型参数仍然可能导致隐私泄露。利用共享的模型参数，可以重构局部训练数据，从而对个人隐私和安全构成威胁。我们观察到，目前大多数攻击都是针对特定客户端的数据重构，而对全局模型的信息泄露关注有限。在我们的工作中，我们提出了一种新的基于共享模型参数的FL攻击，可以推断出全局模型的数据分布。与其他旨在推断单个客户端原始数据的FL攻击不同，本文提出的数据分布推断攻击表明攻击者可以推断出全局模型背后的数据分布信息。我们认为这些信息是有价值的，因为训练有素的全局模型背后的训练数据表明了特定任务的共同知识，例如社交网络和电子商务应用程序。为了实现这样的攻击，我们的关键思想是采用深度强化学习方法来指导攻击过程，其中RL代理自动调整伪数据分布，直到它与地面真实数据分布相似。通过精心设计的马尔可夫决策过程（MDP），我们的实现确保了我们的攻击具有稳定的性能，实验结果验证了我们提出的推理攻击的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

High-Confidence Computing

CiteScore

4.70

自引率

0.00%

发文量