MADRL-Based Model Partitioning, Aggregation Control, and Resource Allocation for Cloud-Edge-Device Collaborative Split Federated Learning

IF 7.7 2区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

IEEE Transactions on Mobile Computing Pub Date : 2025-01-15 DOI:10.1109/TMC.2025.3530482

Wenhao Fan;Penghui Chen;Xiongfei Chun;Yuan’an Liu

{"title":"MADRL-Based Model Partitioning, Aggregation Control, and Resource Allocation for Cloud-Edge-Device Collaborative Split Federated Learning","authors":"Wenhao Fan;Penghui Chen;Xiongfei Chun;Yuan’an Liu","doi":"10.1109/TMC.2025.3530482","DOIUrl":null,"url":null,"abstract":"Split Federated Learning (SFL) has emerged as a promising paradigm to enhance FL by partitioning the Machine Learning (ML) model into parts and deploying them across clients and servers, effectively mitigating the workload on resource-constrained devices and preserving privacy. Compared to cloud-device-based and edge-device-based SFL, cloud-edge-device collaborative SFL offers both lower communication latency and wider network coverage. However, existing works adopt a uniform model partitioning strategy for different devices, ignoring the heterogeneous nature of device resources. This oversight leads to severe straggler problems, making the training process inefficient. Moreover, they do not consider joint optimization of model aggregation control and computing and communication resource allocation, and lack distributed algorithm design. To address these issues, we propose a joint resource management scheme for cloud-edge-device collaborative SFL to optimize the training latency and energy consumption of all devices. In our scheme, the partitioning strategy is optimized for each device based on resource heterogeneity. Meanwhile, we jointly optimize the aggregation frequency of ML models, computing resource allocation for all devices and edge servers, and transmit power allocation for all devices. We formulate a coordination game among all edge servers and then design a distributed optimization algorithm employing partially observable Multi-Agent Deep Reinforcement Learning (MADRL) with integrated numerical methods. Extensive experiments are conducted to validate the convergence of our algorithm and demonstrate the superiority of our scheme via evaluations under multiple scenarios and in comparison with four reference schemes.","PeriodicalId":50389,"journal":{"name":"IEEE Transactions on Mobile Computing","volume":"24 6","pages":"5324-5341"},"PeriodicalIF":7.7000,"publicationDate":"2025-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Mobile Computing","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10843329/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Split Federated Learning (SFL) has emerged as a promising paradigm to enhance FL by partitioning the Machine Learning (ML) model into parts and deploying them across clients and servers, effectively mitigating the workload on resource-constrained devices and preserving privacy. Compared to cloud-device-based and edge-device-based SFL, cloud-edge-device collaborative SFL offers both lower communication latency and wider network coverage. However, existing works adopt a uniform model partitioning strategy for different devices, ignoring the heterogeneous nature of device resources. This oversight leads to severe straggler problems, making the training process inefficient. Moreover, they do not consider joint optimization of model aggregation control and computing and communication resource allocation, and lack distributed algorithm design. To address these issues, we propose a joint resource management scheme for cloud-edge-device collaborative SFL to optimize the training latency and energy consumption of all devices. In our scheme, the partitioning strategy is optimized for each device based on resource heterogeneity. Meanwhile, we jointly optimize the aggregation frequency of ML models, computing resource allocation for all devices and edge servers, and transmit power allocation for all devices. We formulate a coordination game among all edge servers and then design a distributed optimization algorithm employing partially observable Multi-Agent Deep Reinforcement Learning (MADRL) with integrated numerical methods. Extensive experiments are conducted to validate the convergence of our algorithm and demonstrate the superiority of our scheme via evaluations under multiple scenarios and in comparison with four reference schemes.

查看原文本刊更多论文

基于madrl的云-边缘设备协同分离联邦学习模型划分、聚合控制和资源分配

分裂联邦学习（SFL）已经成为一种有前途的范例，通过将机器学习（ML）模型划分为多个部分并在客户端和服务器上部署它们，有效地减轻资源受限设备上的工作负载并保护隐私，从而增强机器学习。与基于云设备和基于边缘设备的SFL相比，云边缘设备协同SFL具有更低的通信延迟和更广的网络覆盖范围。然而，现有的工作对不同的设备采用统一的模型划分策略，忽略了设备资源的异构性。这种疏忽导致了严重的掉队问题，使培训过程效率低下。没有考虑模型聚合控制和计算通信资源分配的联合优化，缺乏分布式算法设计。针对这些问题，我们提出了一种云边缘设备协同SFL的联合资源管理方案，以优化所有设备的训练延迟和能耗。在我们的方案中，基于资源异构性对每个设备的分区策略进行了优化。同时，我们共同优化ML模型的聚合频率、全设备和边缘服务器的计算资源分配、全设备的传输功率分配。在此基础上，设计了一种基于部分可观察多智能体深度强化学习（MADRL）的分布式优化算法。大量的实验验证了算法的收敛性，并通过多种场景下的评估和与四种参考方案的比较证明了我们方案的优越性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Mobile Computing 工程技术-电信学

CiteScore

12.90

自引率

2.50%

发文量

403

审稿时长

6.6 months

期刊介绍： IEEE Transactions on Mobile Computing addresses key technical issues related to various aspects of mobile computing. This includes (a) architectures, (b) support services, (c) algorithm/protocol design and analysis, (d) mobile environments, (e) mobile communication systems, (f) applications, and (g) emerging technologies. Topics of interest span a wide range, covering aspects like mobile networks and hosts, mobility management, multimedia, operating system support, power management, online and mobile environments, security, scalability, reliability, and emerging technologies such as wearable computers, body area networks, and wireless sensor networks. The journal serves as a comprehensive platform for advancements in mobile computing research.