{"title":"MADRL-Based Model Partitioning, Aggregation Control, and Resource Allocation for Cloud-Edge-Device Collaborative Split Federated Learning","authors":"Wenhao Fan;Penghui Chen;Xiongfei Chun;Yuan’an Liu","doi":"10.1109/TMC.2025.3530482","DOIUrl":null,"url":null,"abstract":"Split Federated Learning (SFL) has emerged as a promising paradigm to enhance FL by partitioning the Machine Learning (ML) model into parts and deploying them across clients and servers, effectively mitigating the workload on resource-constrained devices and preserving privacy. Compared to cloud-device-based and edge-device-based SFL, cloud-edge-device collaborative SFL offers both lower communication latency and wider network coverage. However, existing works adopt a uniform model partitioning strategy for different devices, ignoring the heterogeneous nature of device resources. This oversight leads to severe straggler problems, making the training process inefficient. Moreover, they do not consider joint optimization of model aggregation control and computing and communication resource allocation, and lack distributed algorithm design. To address these issues, we propose a joint resource management scheme for cloud-edge-device collaborative SFL to optimize the training latency and energy consumption of all devices. In our scheme, the partitioning strategy is optimized for each device based on resource heterogeneity. Meanwhile, we jointly optimize the aggregation frequency of ML models, computing resource allocation for all devices and edge servers, and transmit power allocation for all devices. We formulate a coordination game among all edge servers and then design a distributed optimization algorithm employing partially observable Multi-Agent Deep Reinforcement Learning (MADRL) with integrated numerical methods. Extensive experiments are conducted to validate the convergence of our algorithm and demonstrate the superiority of our scheme via evaluations under multiple scenarios and in comparison with four reference schemes.","PeriodicalId":50389,"journal":{"name":"IEEE Transactions on Mobile Computing","volume":"24 6","pages":"5324-5341"},"PeriodicalIF":7.7000,"publicationDate":"2025-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Mobile Computing","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10843329/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Split Federated Learning (SFL) has emerged as a promising paradigm to enhance FL by partitioning the Machine Learning (ML) model into parts and deploying them across clients and servers, effectively mitigating the workload on resource-constrained devices and preserving privacy. Compared to cloud-device-based and edge-device-based SFL, cloud-edge-device collaborative SFL offers both lower communication latency and wider network coverage. However, existing works adopt a uniform model partitioning strategy for different devices, ignoring the heterogeneous nature of device resources. This oversight leads to severe straggler problems, making the training process inefficient. Moreover, they do not consider joint optimization of model aggregation control and computing and communication resource allocation, and lack distributed algorithm design. To address these issues, we propose a joint resource management scheme for cloud-edge-device collaborative SFL to optimize the training latency and energy consumption of all devices. In our scheme, the partitioning strategy is optimized for each device based on resource heterogeneity. Meanwhile, we jointly optimize the aggregation frequency of ML models, computing resource allocation for all devices and edge servers, and transmit power allocation for all devices. We formulate a coordination game among all edge servers and then design a distributed optimization algorithm employing partially observable Multi-Agent Deep Reinforcement Learning (MADRL) with integrated numerical methods. Extensive experiments are conducted to validate the convergence of our algorithm and demonstrate the superiority of our scheme via evaluations under multiple scenarios and in comparison with four reference schemes.
期刊介绍:
IEEE Transactions on Mobile Computing addresses key technical issues related to various aspects of mobile computing. This includes (a) architectures, (b) support services, (c) algorithm/protocol design and analysis, (d) mobile environments, (e) mobile communication systems, (f) applications, and (g) emerging technologies. Topics of interest span a wide range, covering aspects like mobile networks and hosts, mobility management, multimedia, operating system support, power management, online and mobile environments, security, scalability, reliability, and emerging technologies such as wearable computers, body area networks, and wireless sensor networks. The journal serves as a comprehensive platform for advancements in mobile computing research.