SHARE: Shaping Data Distribution at Edge for Communication-Efficient Hierarchical Federated Learning

2021 IEEE 41st International Conference on Distributed Computing Systems (ICDCS) Pub Date : 2021-07-01 DOI:10.1109/ICDCS51616.2021.00012

Yongheng Deng, Feng Lyu, Ju Ren, Yongmin Zhang, Yuezhi Zhou, Yaoxue Zhang, Yuanyuan Yang

{"title":"SHARE: Shaping Data Distribution at Edge for Communication-Efficient Hierarchical Federated Learning","authors":"Yongheng Deng, Feng Lyu, Ju Ren, Yongmin Zhang, Yuezhi Zhou, Yaoxue Zhang, Yuanyuan Yang","doi":"10.1109/ICDCS51616.2021.00012","DOIUrl":null,"url":null,"abstract":"Federated learning (FL) can enable distributed model training over mobile nodes without sharing privacy-sensitive raw data. However, to achieve efficient FL, one significant challenge is the prohibitive communication overhead to commit model updates since frequent cloud model aggregations are usually required to reach a target accuracy, especially when the data distributions at mobile nodes are imbalanced. With pilot experiments, it is verified that frequent cloud model aggregations can be avoided without performance degradation if model aggregations can be conducted at edge. To this end, we shed light on the hierarchical federated learning (HFL) framework, where a subset of distributed nodes are selected as edge aggregators to conduct edge aggregations. Particularly, under the HFL framework, we formulate a communication cost minimization (CCM) problem to minimize the communication cost raised by edge/cloud aggregations with making decisions on edge aggregator selection and distributed node association. Inspired by the insight that the potential of HFL lies in the data distribution at edge aggregators, we propose SHARE, i.e., SHaping dAta distRibution at Edge, to transform and solve the CCM problem. In SHARE, we divide the original problem into two sub-problems to minimize the per-round communication cost and mean Kullback-Leibler divergence of edge aggregator data, and devise two light-weight algorithms to solve them, respectively. Extensive experiments under various settings are carried out to corroborate the efficacy of SHARE.","PeriodicalId":222376,"journal":{"name":"2021 IEEE 41st International Conference on Distributed Computing Systems (ICDCS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"28","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE 41st International Conference on Distributed Computing Systems (ICDCS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDCS51616.2021.00012","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 28

Abstract

Federated learning (FL) can enable distributed model training over mobile nodes without sharing privacy-sensitive raw data. However, to achieve efficient FL, one significant challenge is the prohibitive communication overhead to commit model updates since frequent cloud model aggregations are usually required to reach a target accuracy, especially when the data distributions at mobile nodes are imbalanced. With pilot experiments, it is verified that frequent cloud model aggregations can be avoided without performance degradation if model aggregations can be conducted at edge. To this end, we shed light on the hierarchical federated learning (HFL) framework, where a subset of distributed nodes are selected as edge aggregators to conduct edge aggregations. Particularly, under the HFL framework, we formulate a communication cost minimization (CCM) problem to minimize the communication cost raised by edge/cloud aggregations with making decisions on edge aggregator selection and distributed node association. Inspired by the insight that the potential of HFL lies in the data distribution at edge aggregators, we propose SHARE, i.e., SHaping dAta distRibution at Edge, to transform and solve the CCM problem. In SHARE, we divide the original problem into two sub-problems to minimize the per-round communication cost and mean Kullback-Leibler divergence of edge aggregator data, and devise two light-weight algorithms to solve them, respectively. Extensive experiments under various settings are carried out to corroborate the efficacy of SHARE.

查看原文本刊更多论文

共享:在通信高效的分层联邦学习边缘塑造数据分布

联邦学习(FL)可以在移动节点上进行分布式模型训练，而无需共享隐私敏感的原始数据。然而，为了实现高效的FL，一个重要的挑战是提交模型更新的通信开销过高，因为通常需要频繁的云模型聚合来达到目标精度，特别是当移动节点上的数据分布不平衡时。通过先导实验，验证了在边缘处进行模型聚合可以避免频繁的云模型聚合而不降低性能。为此，我们阐明了分层联邦学习(HFL)框架，其中选择分布式节点的子集作为边缘聚合器进行边缘聚合。特别是，在HFL框架下，通过对边缘聚合器的选择和分布式节点的关联进行决策，提出了通信成本最小化(CCM)问题，以最小化边缘/云聚合带来的通信成本。我们认识到HFL的潜力在于边缘聚合器的数据分布，因此我们提出了SHARE，即边缘数据分布的塑造，来改造和解决CCM问题。在SHARE算法中，我们以最小化每轮通信代价和边缘聚合器数据的平均Kullback-Leibler散度为目标，将原问题分解为两个子问题，并分别设计了两种轻量级算法进行求解。在不同的设置下进行了大量的实验来证实SHARE的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2021 IEEE 41st International Conference on Distributed Computing Systems (ICDCS)

自引率

0.00%

发文量