Async-HFL:分层物联网网络中高效鲁棒的异步联邦学习

Proceedings of the 8th ACM/IEEE Conference on Internet of Things Design and Implementation Pub Date : 2023-01-17 DOI:10.1145/3576842.3582377

Xiaofan Yu, L. Cherkasova, Hars Vardhan, Quanling Zhao, Emily Ekaireb, Xiyuan Zhang, A. Mazumdar, T. Rosing

{"title":"Async-HFL:分层物联网网络中高效鲁棒的异步联邦学习","authors":"Xiaofan Yu, L. Cherkasova, Hars Vardhan, Quanling Zhao, Emily Ekaireb, Xiyuan Zhang, A. Mazumdar, T. Rosing","doi":"10.1145/3576842.3582377","DOIUrl":null,"url":null,"abstract":"Federated Learning (FL) has gained increasing interest in recent years as a distributed on-device learning paradigm. However, multiple challenges remain to be addressed for deploying FL in real-world Internet-of-Things (IoT) networks with hierarchies. Although existing works have proposed various approaches to account data heterogeneity, system heterogeneity, unexpected stragglers and scalibility, none of them provides a systematic solution to address all of the challenges in a hierarchical and unreliable IoT network. In this paper, we propose an asynchronous and hierarchical framework (Async-HFL) for performing FL in a common three-tier IoT network architecture. In response to the largely varied networking and system processing delays, Async-HFL employs asynchronous aggregations at both the gateway and cloud levels thus avoids long waiting time. To fully unleash the potential of Async-HFL in converging speed under system heterogeneities and stragglers, we design device selection at the gateway level and device-gateway association at the cloud level. Device selection module chooses diverse and fast edge devices to trigger local training in real-time while device-gateway association module determines the efficient network topology periodically after several cloud epochs, with both modules satisfying bandwidth limitations. We evaluate Async-HFL’s convergence speedup using large-scale simulations based on ns-3 and a network topology from NYCMesh. Our results show that Async-HFL converges 1.08-1.31x faster in wall-clock time and saves up to 21.6% total communication cost compared to state-of-the-art asynchronous FL algorithms (with client selection). We further validate Async-HFL on a physical deployment and observe its robust convergence under unexpected stragglers.","PeriodicalId":266438,"journal":{"name":"Proceedings of the 8th ACM/IEEE Conference on Internet of Things Design and Implementation","volume":"64 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Async-HFL: Efficient and Robust Asynchronous Federated Learning in Hierarchical IoT Networks\",\"authors\":\"Xiaofan Yu, L. Cherkasova, Hars Vardhan, Quanling Zhao, Emily Ekaireb, Xiyuan Zhang, A. Mazumdar, T. Rosing\",\"doi\":\"10.1145/3576842.3582377\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Federated Learning (FL) has gained increasing interest in recent years as a distributed on-device learning paradigm. However, multiple challenges remain to be addressed for deploying FL in real-world Internet-of-Things (IoT) networks with hierarchies. Although existing works have proposed various approaches to account data heterogeneity, system heterogeneity, unexpected stragglers and scalibility, none of them provides a systematic solution to address all of the challenges in a hierarchical and unreliable IoT network. In this paper, we propose an asynchronous and hierarchical framework (Async-HFL) for performing FL in a common three-tier IoT network architecture. In response to the largely varied networking and system processing delays, Async-HFL employs asynchronous aggregations at both the gateway and cloud levels thus avoids long waiting time. To fully unleash the potential of Async-HFL in converging speed under system heterogeneities and stragglers, we design device selection at the gateway level and device-gateway association at the cloud level. Device selection module chooses diverse and fast edge devices to trigger local training in real-time while device-gateway association module determines the efficient network topology periodically after several cloud epochs, with both modules satisfying bandwidth limitations. We evaluate Async-HFL’s convergence speedup using large-scale simulations based on ns-3 and a network topology from NYCMesh. Our results show that Async-HFL converges 1.08-1.31x faster in wall-clock time and saves up to 21.6% total communication cost compared to state-of-the-art asynchronous FL algorithms (with client selection). We further validate Async-HFL on a physical deployment and observe its robust convergence under unexpected stragglers.\",\"PeriodicalId\":266438,\"journal\":{\"name\":\"Proceedings of the 8th ACM/IEEE Conference on Internet of Things Design and Implementation\",\"volume\":\"64 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-01-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 8th ACM/IEEE Conference on Internet of Things Design and Implementation\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3576842.3582377\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 8th ACM/IEEE Conference on Internet of Things Design and Implementation","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3576842.3582377","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

摘要

近年来，联邦学习(FL)作为一种分布式设备上学习范式获得了越来越多的关注。然而，在具有层次结构的真实物联网(IoT)网络中部署FL仍然需要解决多个挑战。尽管现有的工作已经提出了各种方法来考虑数据异构、系统异构、意外掉队和可伸缩性，但它们都没有提供一个系统的解决方案来解决分层和不可靠的物联网网络中的所有挑战。在本文中，我们提出了一个异步和分层框架(Async-HFL)，用于在常见的三层物联网网络架构中执行FL。为了应对各种各样的网络和系统处理延迟，Async-HFL在网关和云级别都采用异步聚合，从而避免了长时间的等待时间。为了充分发挥Async-HFL在系统异构和离散节点下收敛速度的潜力，我们在网关级别设计了设备选择，在云级别设计了设备-网关关联。设备选择模块实时选择多样、快速的边缘设备触发局部训练，设备网关关联模块在经过几个云时代后周期性地确定有效的网络拓扑，两个模块都满足带宽限制。我们使用基于ns-3的大规模模拟和NYCMesh的网络拓扑来评估Async-HFL的收敛速度。我们的研究结果表明，与最先进的异步FL算法(带客户端选择)相比，Async-HFL在时钟时间上的收敛速度提高了1.08-1.31倍，节省了21.6%的总通信成本。我们进一步在物理部署上验证了Async-HFL，并观察了它在意外掉队情况下的鲁棒收敛性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Async-HFL: Efficient and Robust Asynchronous Federated Learning in Hierarchical IoT Networks

Federated Learning (FL) has gained increasing interest in recent years as a distributed on-device learning paradigm. However, multiple challenges remain to be addressed for deploying FL in real-world Internet-of-Things (IoT) networks with hierarchies. Although existing works have proposed various approaches to account data heterogeneity, system heterogeneity, unexpected stragglers and scalibility, none of them provides a systematic solution to address all of the challenges in a hierarchical and unreliable IoT network. In this paper, we propose an asynchronous and hierarchical framework (Async-HFL) for performing FL in a common three-tier IoT network architecture. In response to the largely varied networking and system processing delays, Async-HFL employs asynchronous aggregations at both the gateway and cloud levels thus avoids long waiting time. To fully unleash the potential of Async-HFL in converging speed under system heterogeneities and stragglers, we design device selection at the gateway level and device-gateway association at the cloud level. Device selection module chooses diverse and fast edge devices to trigger local training in real-time while device-gateway association module determines the efficient network topology periodically after several cloud epochs, with both modules satisfying bandwidth limitations. We evaluate Async-HFL’s convergence speedup using large-scale simulations based on ns-3 and a network topology from NYCMesh. Our results show that Async-HFL converges 1.08-1.31x faster in wall-clock time and saves up to 21.6% total communication cost compared to state-of-the-art asynchronous FL algorithms (with client selection). We further validate Async-HFL on a physical deployment and observe its robust convergence under unexpected stragglers.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 8th ACM/IEEE Conference on Internet of Things Design and Implementation

自引率

0.00%

发文量