联邦学习的样本级数据选择

IEEE INFOCOM 2021 - IEEE Conference on Computer Communications Pub Date : 2021-05-10 DOI:10.1109/INFOCOM42981.2021.9488723

Anran Li, Lan Zhang, Juntao Tan, Yaxuan Qin, Junhao Wang, Xiangyang Li

{"title":"联邦学习的样本级数据选择","authors":"Anran Li, Lan Zhang, Juntao Tan, Yaxuan Qin, Junhao Wang, Xiangyang Li","doi":"10.1109/INFOCOM42981.2021.9488723","DOIUrl":null,"url":null,"abstract":"Federated learning (FL) enables participants to collaboratively construct a global machine learning model without sharing their local training data to the remote server. In FL systems, the selection of training samples has a significant impact on model performances, e.g., selecting participants whose datasets have erroneous samples, skewed categorical distributions, and low content diversity would result in low accuracy and unstable models. In this work, we aim to solve the exigent optimization problem that selects a collection of high-quality training samples for a given FL task under a monetary budget in a privacy-preserving way, which is extremely challenging without visibility to participants’ local data and training process. We provide a systematic analysis of important data related factors affecting the model performance and propose a holistic design to privately and efficiently select high-quality data samples considering all these factors. We verify the merits of our proposed solution with extensive experiments on a real AIoT system with 50 clients, including 20 edge computers, 20 laptops, and 10 desktops. The experimental results validates that our solution achieves accurate and efficient selection of high-quality data samples, and consequently an FL model with a faster convergence speed and higher accuracy than that achieved by existing solutions.","PeriodicalId":293079,"journal":{"name":"IEEE INFOCOM 2021 - IEEE Conference on Computer Communications","volume":"361 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-05-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"55","resultStr":"{\"title\":\"Sample-level Data Selection for Federated Learning\",\"authors\":\"Anran Li, Lan Zhang, Juntao Tan, Yaxuan Qin, Junhao Wang, Xiangyang Li\",\"doi\":\"10.1109/INFOCOM42981.2021.9488723\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Federated learning (FL) enables participants to collaboratively construct a global machine learning model without sharing their local training data to the remote server. In FL systems, the selection of training samples has a significant impact on model performances, e.g., selecting participants whose datasets have erroneous samples, skewed categorical distributions, and low content diversity would result in low accuracy and unstable models. In this work, we aim to solve the exigent optimization problem that selects a collection of high-quality training samples for a given FL task under a monetary budget in a privacy-preserving way, which is extremely challenging without visibility to participants’ local data and training process. We provide a systematic analysis of important data related factors affecting the model performance and propose a holistic design to privately and efficiently select high-quality data samples considering all these factors. We verify the merits of our proposed solution with extensive experiments on a real AIoT system with 50 clients, including 20 edge computers, 20 laptops, and 10 desktops. The experimental results validates that our solution achieves accurate and efficient selection of high-quality data samples, and consequently an FL model with a faster convergence speed and higher accuracy than that achieved by existing solutions.\",\"PeriodicalId\":293079,\"journal\":{\"name\":\"IEEE INFOCOM 2021 - IEEE Conference on Computer Communications\",\"volume\":\"361 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-05-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"55\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE INFOCOM 2021 - IEEE Conference on Computer Communications\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/INFOCOM42981.2021.9488723\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE INFOCOM 2021 - IEEE Conference on Computer Communications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/INFOCOM42981.2021.9488723","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 55

摘要

联邦学习(FL)使参与者能够协作构建全局机器学习模型，而无需将本地训练数据共享给远程服务器。在FL系统中，训练样本的选择对模型的性能有很大的影响，例如，选择样本错误、分类分布偏斜、内容多样性低的数据集的参与者，会导致模型精度低、不稳定。在这项工作中，我们的目标是解决紧急优化问题，即在货币预算下以保护隐私的方式为给定的FL任务选择一组高质量的训练样本，这是一个极具挑战性的问题，无法看到参与者的本地数据和训练过程。我们对影响模型性能的重要数据相关因素进行了系统分析，并提出了一种综合考虑这些因素的整体设计，以私密、高效地选择高质量的数据样本。我们在一个真正的AIoT系统上进行了广泛的实验，验证了我们提出的解决方案的优点，该系统有50个客户端，包括20台边缘计算机、20台笔记本电脑和10台台式电脑。实验结果验证了我们的解决方案能够准确有效地选择高质量的数据样本，从而获得比现有解决方案更快的收敛速度和更高的精度的FL模型。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Sample-level Data Selection for Federated Learning

Federated learning (FL) enables participants to collaboratively construct a global machine learning model without sharing their local training data to the remote server. In FL systems, the selection of training samples has a significant impact on model performances, e.g., selecting participants whose datasets have erroneous samples, skewed categorical distributions, and low content diversity would result in low accuracy and unstable models. In this work, we aim to solve the exigent optimization problem that selects a collection of high-quality training samples for a given FL task under a monetary budget in a privacy-preserving way, which is extremely challenging without visibility to participants’ local data and training process. We provide a systematic analysis of important data related factors affecting the model performance and propose a holistic design to privately and efficiently select high-quality data samples considering all these factors. We verify the merits of our proposed solution with extensive experiments on a real AIoT system with 50 clients, including 20 edge computers, 20 laptops, and 10 desktops. The experimental results validates that our solution achieves accurate and efficient selection of high-quality data samples, and consequently an FL model with a faster convergence speed and higher accuracy than that achieved by existing solutions.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE INFOCOM 2021 - IEEE Conference on Computer Communications

自引率

0.00%

发文量