边缘云协作式持续学习的在线管理：双时标方法

IF 7.7 2区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

IEEE Transactions on Mobile Computing Pub Date : 2024-09-02 DOI:10.1109/TMC.2024.3451715

Shaohui Lin;Xiaoxi Zhang;Yupeng Li;Carlee Joe-Wong;Jingpu Duan;Dongxiao Yu;Yu Wu;Xu Chen

{"title":"边缘云协作式持续学习的在线管理：双时标方法","authors":"Shaohui Lin;Xiaoxi Zhang;Yupeng Li;Carlee Joe-Wong;Jingpu Duan;Dongxiao Yu;Yu Wu;Xu Chen","doi":"10.1109/TMC.2024.3451715","DOIUrl":null,"url":null,"abstract":"Deep learning (DL) powered real-time applications usually need continuous training using data streams generated over time and across different geographical locations. Enabling data offloading among computation nodes through model training is promising to mitigate the problem that devices generating large datasets may have low computation capability. However, offloading can compromise model convergence and incur communication costs, which must be balanced with the long-term cost spent on computation and model synchronization. Therefore, this paper proposes EdgeC3, a novel framework that can optimize the frequency of model aggregation and dynamic offloading for continuously generated data streams, navigating the trade-off between long-term accuracy and cost. We first provide a new error bound to capture the impacts of data dynamics that are varying over time and heterogeneous across devices, as well as quantifying varied data heterogeneity between local models and the global one. Based on the bound, we design a two-timescale online optimization framework. We periodically learn the synchronization frequency to adapt with uncertain future offloading and network changes. In the finer timescale, we manage online offloading by extending Lyapunov optimization techniques to handle an unconventional setting, where our long-term global constraint can have abruptly changed aggregation frequencies that are decided in the longer timescale. Finally, we theoretically prove the convergence of EdgeC3 by integrating the coupled effects of our two-timescale decisions, and we demonstrate its advantage through extensive experiments performing distributed DL training for different domains.","PeriodicalId":50389,"journal":{"name":"IEEE Transactions on Mobile Computing","volume":"23 12","pages":"14561-14574"},"PeriodicalIF":7.7000,"publicationDate":"2024-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Online Management for Edge-Cloud Collaborative Continuous Learning: A Two-Timescale Approach\",\"authors\":\"Shaohui Lin;Xiaoxi Zhang;Yupeng Li;Carlee Joe-Wong;Jingpu Duan;Dongxiao Yu;Yu Wu;Xu Chen\",\"doi\":\"10.1109/TMC.2024.3451715\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Deep learning (DL) powered real-time applications usually need continuous training using data streams generated over time and across different geographical locations. Enabling data offloading among computation nodes through model training is promising to mitigate the problem that devices generating large datasets may have low computation capability. However, offloading can compromise model convergence and incur communication costs, which must be balanced with the long-term cost spent on computation and model synchronization. Therefore, this paper proposes EdgeC3, a novel framework that can optimize the frequency of model aggregation and dynamic offloading for continuously generated data streams, navigating the trade-off between long-term accuracy and cost. We first provide a new error bound to capture the impacts of data dynamics that are varying over time and heterogeneous across devices, as well as quantifying varied data heterogeneity between local models and the global one. Based on the bound, we design a two-timescale online optimization framework. We periodically learn the synchronization frequency to adapt with uncertain future offloading and network changes. In the finer timescale, we manage online offloading by extending Lyapunov optimization techniques to handle an unconventional setting, where our long-term global constraint can have abruptly changed aggregation frequencies that are decided in the longer timescale. Finally, we theoretically prove the convergence of EdgeC3 by integrating the coupled effects of our two-timescale decisions, and we demonstrate its advantage through extensive experiments performing distributed DL training for different domains.\",\"PeriodicalId\":50389,\"journal\":{\"name\":\"IEEE Transactions on Mobile Computing\",\"volume\":\"23 12\",\"pages\":\"14561-14574\"},\"PeriodicalIF\":7.7000,\"publicationDate\":\"2024-09-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Mobile Computing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10663344/\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Mobile Computing","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10663344/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

摘要

深度学习（DL）驱动的实时应用通常需要使用随时间和不同地理位置生成的数据流进行持续训练。通过模型训练在计算节点之间实现数据卸载，有望缓解生成大型数据集的设备计算能力低的问题。然而，卸载可能会影响模型收敛性并产生通信成本，这必须与计算和模型同步的长期成本相平衡。因此，本文提出了一个新颖的框架 EdgeC3，它可以优化连续生成数据流的模型聚合和动态卸载频率，在长期精度和成本之间进行权衡。我们首先提供了一个新的误差边界，以捕捉随时间变化和跨设备异构的数据动态的影响，并量化本地模型和全局模型之间的不同数据异构性。基于该界限，我们设计了一个双时间尺度在线优化框架。我们定期学习同步频率，以适应未来不确定的卸载和网络变化。在更细的时间尺度上，我们通过扩展 Lyapunov 优化技术来管理在线卸载，以处理非常规环境，在这种环境下，我们的长期全局约束可能会突然改变聚合频率，而这些频率是在更长的时间尺度上决定的。最后，我们从理论上证明了 EdgeC3 的收敛性，它综合了两个时间尺度决策的耦合效应，并通过针对不同领域进行分布式 DL 训练的大量实验证明了它的优势。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Online Management for Edge-Cloud Collaborative Continuous Learning: A Two-Timescale Approach

Deep learning (DL) powered real-time applications usually need continuous training using data streams generated over time and across different geographical locations. Enabling data offloading among computation nodes through model training is promising to mitigate the problem that devices generating large datasets may have low computation capability. However, offloading can compromise model convergence and incur communication costs, which must be balanced with the long-term cost spent on computation and model synchronization. Therefore, this paper proposes EdgeC3, a novel framework that can optimize the frequency of model aggregation and dynamic offloading for continuously generated data streams, navigating the trade-off between long-term accuracy and cost. We first provide a new error bound to capture the impacts of data dynamics that are varying over time and heterogeneous across devices, as well as quantifying varied data heterogeneity between local models and the global one. Based on the bound, we design a two-timescale online optimization framework. We periodically learn the synchronization frequency to adapt with uncertain future offloading and network changes. In the finer timescale, we manage online offloading by extending Lyapunov optimization techniques to handle an unconventional setting, where our long-term global constraint can have abruptly changed aggregation frequencies that are decided in the longer timescale. Finally, we theoretically prove the convergence of EdgeC3 by integrating the coupled effects of our two-timescale decisions, and we demonstrate its advantage through extensive experiments performing distributed DL training for different domains.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Transactions on Mobile Computing 工程技术-电信学

CiteScore

12.90

自引率

2.50%

发文量

403

审稿时长

6.6 months

期刊介绍： IEEE Transactions on Mobile Computing addresses key technical issues related to various aspects of mobile computing. This includes (a) architectures, (b) support services, (c) algorithm/protocol design and analysis, (d) mobile environments, (e) mobile communication systems, (f) applications, and (g) emerging technologies. Topics of interest span a wide range, covering aspects like mobile networks and hosts, mobility management, multimedia, operating system support, power management, online and mobile environments, security, scalability, reliability, and emerging technologies such as wearable computers, body area networks, and wireless sensor networks. The journal serves as a comprehensive platform for advancements in mobile computing research.