{"title":"Decaf: Data Distribution Decompose Attack Against Federated Learning","authors":"Zhiyang Dai;Yansong Gao;Chunyi Zhou;Anmin Fu;Zhi Zhang;Minhui Xue;Yifeng Zheng;Yuqing Zhang","doi":"10.1109/TIFS.2024.3516545","DOIUrl":null,"url":null,"abstract":"In contrast to prevalent Federated Learning (FL) privacy inference techniques such as generative adversarial networks attacks, membership inference attacks, property inference attacks, and model inversion attacks, we devise an innovative privacy threat: the Data Distribution Decompose Attack on FL, termed \n<monospace>Decaf</monospace>\n. This attack enables an honest-but-curious FL server to meticulously profile the proportion of each class owned by the victim FL user, divulging sensitive information like local market item distribution and business competitiveness. The crux of \n<monospace>Decaf</monospace>\n lies in the profound observation that the magnitude of local model gradient changes closely mirrors the underlying data distribution, including the proportion of each class. \n<monospace>Decaf</monospace>\n addresses two crucial challenges: accurately identify the missing/null class(es) given by any victim user as a premise and then quantify the precise relationship between gradient changes and each remaining non-null class. Notably, \n<monospace>Decaf</monospace>\n operates stealthily, rendering it entirely passive and undetectable to victim users regarding the infringement of their data distribution privacy. Experimental validation on five benchmark datasets (MNIST, FASHION-MNIST, CIFAR-10, FER-2013, and SkinCancer) employing diverse model architectures, including customized convolutional networks, standardized VGG16, and ResNet18, demonstrates \n<monospace>Decaf</monospace>\n’s efficacy. Results indicate its ability to accurately decompose local user data distribution, regardless of whether it is IID or non-IID distributed. Specifically, the dissimilarity measured using \n<inline-formula> <tex-math>$L_{\\infty }$ </tex-math></inline-formula>\n distance between the distribution decomposed by \n<monospace>Decaf</monospace>\n and ground truth is consistently below 5% when no null classes exist. Moreover, \n<monospace>Decaf</monospace>\n achieves 100% accuracy in determining any victim user’s null classes, validated through formal proof.","PeriodicalId":13492,"journal":{"name":"IEEE Transactions on Information Forensics and Security","volume":"20 ","pages":"405-420"},"PeriodicalIF":6.3000,"publicationDate":"2024-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Information Forensics and Security","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10795257/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}
引用次数: 0
Abstract
In contrast to prevalent Federated Learning (FL) privacy inference techniques such as generative adversarial networks attacks, membership inference attacks, property inference attacks, and model inversion attacks, we devise an innovative privacy threat: the Data Distribution Decompose Attack on FL, termed
Decaf
. This attack enables an honest-but-curious FL server to meticulously profile the proportion of each class owned by the victim FL user, divulging sensitive information like local market item distribution and business competitiveness. The crux of
Decaf
lies in the profound observation that the magnitude of local model gradient changes closely mirrors the underlying data distribution, including the proportion of each class.
Decaf
addresses two crucial challenges: accurately identify the missing/null class(es) given by any victim user as a premise and then quantify the precise relationship between gradient changes and each remaining non-null class. Notably,
Decaf
operates stealthily, rendering it entirely passive and undetectable to victim users regarding the infringement of their data distribution privacy. Experimental validation on five benchmark datasets (MNIST, FASHION-MNIST, CIFAR-10, FER-2013, and SkinCancer) employing diverse model architectures, including customized convolutional networks, standardized VGG16, and ResNet18, demonstrates
Decaf
’s efficacy. Results indicate its ability to accurately decompose local user data distribution, regardless of whether it is IID or non-IID distributed. Specifically, the dissimilarity measured using
$L_{\infty }$
distance between the distribution decomposed by
Decaf
and ground truth is consistently below 5% when no null classes exist. Moreover,
Decaf
achieves 100% accuracy in determining any victim user’s null classes, validated through formal proof.
与流行的联邦学习(FL)隐私推理技术(如生成对抗网络攻击、成员推理攻击、属性推理攻击和模型反转攻击)相比,我们设计了一种创新的隐私威胁:针对FL的数据分布分解攻击,称为Decaf。这种攻击使诚实但好奇的FL服务器能够细致地描述受害者FL用户拥有的每个类的比例,泄露敏感信息,如当地市场商品分布和业务竞争力。Decaf的关键在于深刻的观察,即局部模型梯度变化的大小密切反映了底层数据分布,包括每个类别的比例。Decaf解决了两个关键的挑战:准确识别任何受害者用户给出的缺失/空类(es)作为前提,然后量化梯度变化与每个剩余非空类之间的精确关系。值得注意的是,Decaf是秘密运作的,使其完全被动,无法检测到受害者用户对其数据分发隐私的侵犯。在5个基准数据集(MNIST、FASHION-MNIST、CIFAR-10、fe -2013和皮肤癌)上进行实验验证,采用不同的模型架构,包括定制卷积网络、标准化VGG16和ResNet18,证明了Decaf的有效性。结果表明,无论本地用户数据分布是IID分布还是非IID分布,该算法都能够准确分解本地用户数据分布。具体来说,使用$L_{\infty }$测量的不相似度,由Decaf分解的分布与ground truth之间的距离始终低于5% when no null classes exist. Moreover, Decaf achieves 100% accuracy in determining any victim user’s null classes, validated through formal proof.
期刊介绍:
The IEEE Transactions on Information Forensics and Security covers the sciences, technologies, and applications relating to information forensics, information security, biometrics, surveillance and systems applications that incorporate these features