FedCSS: Joint Client-and-Sample Selection for Hard Sample-Aware Noise-Robust Federated Learning

Proceedings of the ACM on Management of Data Pub Date : 2023-11-13 DOI:10.1145/3617332

Anran Li, Yue Cao, Jiabao Guo, Hongyi Peng, Qing Guo, Han Yu

{"title":"FedCSS: Joint Client-and-Sample Selection for Hard Sample-Aware Noise-Robust Federated Learning","authors":"Anran Li, Yue Cao, Jiabao Guo, Hongyi Peng, Qing Guo, Han Yu","doi":"10.1145/3617332","DOIUrl":null,"url":null,"abstract":"Federated Learning (FL) enables a large number of data owners (a.k.a. FL clients) to jointly train a machine learning model without disclosing private local data. The importance of local data samples to the FL model vary widely. This is exacerbated by the presence of noisy data, which exhibit large losses similar to important (hard) samples. Currently, there lacks an FL approach that can effectively distinguish hard samples (which are beneficial) from noisy samples (which are harmful). To bridge this gap, we propose the Federated Client and Sample Selection (FedCSS) approach. It is a bilevel optimization approach for FL client-and-sample selection to achieve hard sample-aware noise-robust learning in a privacy preserving manner. It performs meta-learning based online approximation to iteratively update global FL models, select the most positively influential samples and deal with training data noise. Theoretical analysis shows that it is guaranteed to converge in an efficient manner. Experimental comparison against six state-of-the-art baselines on five real-world datasets in the presence of data noise and heterogeneity shows that it achieves up to 26.4% higher test accuracy, while saving communication and computation costs by at least 41.5% and 1.2%, respectively.","PeriodicalId":498157,"journal":{"name":"Proceedings of the ACM on Management of Data","volume":"34 2","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the ACM on Management of Data","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3617332","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Federated Learning (FL) enables a large number of data owners (a.k.a. FL clients) to jointly train a machine learning model without disclosing private local data. The importance of local data samples to the FL model vary widely. This is exacerbated by the presence of noisy data, which exhibit large losses similar to important (hard) samples. Currently, there lacks an FL approach that can effectively distinguish hard samples (which are beneficial) from noisy samples (which are harmful). To bridge this gap, we propose the Federated Client and Sample Selection (FedCSS) approach. It is a bilevel optimization approach for FL client-and-sample selection to achieve hard sample-aware noise-robust learning in a privacy preserving manner. It performs meta-learning based online approximation to iteratively update global FL models, select the most positively influential samples and deal with training data noise. Theoretical analysis shows that it is guaranteed to converge in an efficient manner. Experimental comparison against six state-of-the-art baselines on five real-world datasets in the presence of data noise and heterogeneity shows that it achieves up to 26.4% higher test accuracy, while saving communication and computation costs by at least 41.5% and 1.2%, respectively.

查看原文本刊更多论文

硬样本感知噪声鲁棒联邦学习的联合客户-样本选择

联邦学习(FL)使大量数据所有者(也称为FL客户端)能够在不泄露私有本地数据的情况下共同训练机器学习模型。局部数据样本对FL模型的重要性差别很大。噪声数据的存在加剧了这种情况，这些数据表现出与重要(硬)样本相似的巨大损失。目前，缺乏一种能够有效区分硬样本(有益)和噪声样本(有害)的FL方法。为了弥补这一差距，我们提出了联邦客户端和样本选择(federalclient and Sample Selection, federcss)方法。这是一种双层优化方法，用于FL客户端和样本选择，以保护隐私的方式实现硬样本感知噪声鲁棒学习。它执行基于元学习的在线逼近来迭代更新全局FL模型，选择最具积极影响的样本并处理训练数据噪声。理论分析表明，该算法能保证有效收敛。在存在数据噪声和异质性的五个真实数据集上与六个最先进的基线进行的实验比较表明，该方法的测试精度提高了26.4%，同时通信和计算成本分别节省了至少41.5%和1.2%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the ACM on Management of Data

自引率

0.00%

发文量