Subject Data Auditing via Source Inference Attack in Cross-Silo Federated Learning

IF 3.8 2区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Journal of Information Security and Applications Pub Date : 2025-03-18 DOI:10.1016/j.jisa.2025.104034

Jiaxin Li , Marco Arazzi , Antonino Nocera , Mauro Conti

{"title":"Subject Data Auditing via Source Inference Attack in Cross-Silo Federated Learning","authors":"Jiaxin Li , Marco Arazzi , Antonino Nocera , Mauro Conti","doi":"10.1016/j.jisa.2025.104034","DOIUrl":null,"url":null,"abstract":"<div><div>Source Inference Attack (SIA) in Federated Learning (FL) aims to identify which client used a target data point for local model training. It allows the central server to audit clients’ data usage. In cross-silo FL, a client (silo) collects data from multiple subjects (e.g., individuals, writers, or devices), posing a risk of subject information leakage. Subject Membership Inference Attack (SMIA) targets this scenario and attempts to infer whether any client utilizes data points from a target subject in cross-silo FL. However, existing results on SMIA are limited and based on strong assumptions on the attack scenario. Therefore, we propose a Subject-Level Source Inference Attack (SLSIA) by removing critical constraints that only one client can use a target data point in SIA and imprecise detection of clients utilizing target subject data in SMIA. The attacker, positioned on the server side, controls a target data source and aims to detect all clients using data points from the target subject. Our strategy leverages a binary attack classifier to predict whether the embeddings returned by a local model on test data from the target subject include unique patterns that indicate a client trains the model with data from that subject. To achieve this, the attacker locally pre-trains models using data derived from the target subject and then leverages them to build a training set for the binary attack classifier. Our SLSIA significantly outperforms previous methods on four datasets. Specifically, SLSIA achieves a maximum average accuracy of 0.88 over 50 target subjects. Analyzing embedding distribution and input feature distance shows that datasets with sparse subjects are more susceptible to our attack. Finally, we propose to defend our SLSIA using item-level and subject-level differential privacy mechanisms. The attack accuracy decreases by 36% with a utility loss of 20%, using a subject-level differential privacy budget of 22.</div></div>","PeriodicalId":48638,"journal":{"name":"Journal of Information Security and Applications","volume":"90 ","pages":"Article 104034"},"PeriodicalIF":3.8000,"publicationDate":"2025-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Information Security and Applications","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2214212625000729","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Source Inference Attack (SIA) in Federated Learning (FL) aims to identify which client used a target data point for local model training. It allows the central server to audit clients’ data usage. In cross-silo FL, a client (silo) collects data from multiple subjects (e.g., individuals, writers, or devices), posing a risk of subject information leakage. Subject Membership Inference Attack (SMIA) targets this scenario and attempts to infer whether any client utilizes data points from a target subject in cross-silo FL. However, existing results on SMIA are limited and based on strong assumptions on the attack scenario. Therefore, we propose a Subject-Level Source Inference Attack (SLSIA) by removing critical constraints that only one client can use a target data point in SIA and imprecise detection of clients utilizing target subject data in SMIA. The attacker, positioned on the server side, controls a target data source and aims to detect all clients using data points from the target subject. Our strategy leverages a binary attack classifier to predict whether the embeddings returned by a local model on test data from the target subject include unique patterns that indicate a client trains the model with data from that subject. To achieve this, the attacker locally pre-trains models using data derived from the target subject and then leverages them to build a training set for the binary attack classifier. Our SLSIA significantly outperforms previous methods on four datasets. Specifically, SLSIA achieves a maximum average accuracy of 0.88 over 50 target subjects. Analyzing embedding distribution and input feature distance shows that datasets with sparse subjects are more susceptible to our attack. Finally, we propose to defend our SLSIA using item-level and subject-level differential privacy mechanisms. The attack accuracy decreases by 36% with a utility loss of 20%, using a subject-level differential privacy budget of 22.

查看原文本刊更多论文

跨竖井联邦学习中基于源推理攻击的主题数据审计

联邦学习（FL）中的源推理攻击（SIA）旨在识别哪个客户端使用目标数据点进行本地模型训练。它允许中央服务器审计客户端的数据使用情况。在跨竖井FL中，客户端（竖井）从多个主体（例如，个人、作者或设备）收集数据，这带来了主体信息泄露的风险。对象隶属推理攻击（SMIA）针对这种情况，并试图推断是否有任何客户端使用跨竖井FL中的目标对象的数据点。然而，SMIA的现有结果是有限的，并且基于对攻击场景的强假设。因此，我们提出了一种主题级源推断攻击（SLSIA），通过消除在SIA中只有一个客户端可以使用目标数据点的关键约束，以及在SMIA中对使用目标主题数据的客户端进行不精确检测。攻击者位于服务器端，控制目标数据源，目的是使用来自目标主题的数据点检测所有客户机。我们的策略利用二元攻击分类器来预测本地模型对来自目标主题的测试数据返回的嵌入是否包含唯一模式，这些模式表明客户端使用来自该主题的数据训练模型。为了实现这一点，攻击者使用来自目标主题的数据局部预训练模型，然后利用它们为二进制攻击分类器构建训练集。我们的SLSIA在四个数据集上显著优于以前的方法。具体而言，SLSIA在50个目标受试者中达到了0.88的最高平均准确率。分析嵌入分布和输入特征距离表明，主题稀疏的数据集更容易受到我们的攻击。最后，我们建议使用项目级和主题级差异隐私机制来保护我们的SLSIA。使用主题级差分隐私预算为22时，攻击精度降低36%，效用损失20%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Information Security and Applications Computer Science-Computer Networks and Communications

CiteScore

10.90

自引率

5.40%

发文量

206

审稿时长

56 days

期刊介绍： Journal of Information Security and Applications (JISA) focuses on the original research and practice-driven applications with relevance to information security and applications. JISA provides a common linkage between a vibrant scientific and research community and industry professionals by offering a clear view on modern problems and challenges in information security, as well as identifying promising scientific and "best-practice" solutions. JISA issues offer a balance between original research work and innovative industrial approaches by internationally renowned information security experts and researchers.