Straggler-Resilient Federated Learning: Tackling Computation Heterogeneity With Layer-Wise Partial Model Training in Mobile Edge Network

IF 7.9 2区计算机科学 Q1 ENGINEERING, MULTIDISCIPLINARY

IEEE Transactions on Network Science and Engineering Pub Date : 2025-06-06 DOI:10.1109/TNSE.2025.3577910

Hongda Wu;Ping Wang;C V Aswartha Narayana

{"title":"Straggler-Resilient Federated Learning: Tackling Computation Heterogeneity With Layer-Wise Partial Model Training in Mobile Edge Network","authors":"Hongda Wu;Ping Wang;C V Aswartha Narayana","doi":"10.1109/TNSE.2025.3577910","DOIUrl":null,"url":null,"abstract":"Federated Learning (FL) enables many resource-limited devices to train a model collaboratively without data sharing. However, many existing works focus on model-homogeneous FL, where the global and local models are the same size, ignoring the inherently heterogeneous computational capabilities of different devices and restricting resource-constrained devices from contributing to FL. In this paper, we consider model-heterogeneous FL and propose Federated Partial Model Training (<monospace>FedPMT</monospace>), where devices with smaller computational capabilities work on partial models (subsets of the global model) and contribute to the global model. Different from Dropout-based partial model generations, which remove neurons in (hidden) model layers at random, model training in <monospace>FedPMT</monospace> is achieved from the back-propagation perspective. As such, all devices in <monospace>FedPMT</monospace> prioritize the most crucial parts of the global model. Theoretical analysis shows that the proposed partial model training design has a similar convergence rate to the widely adopted Federated Averaging (FedAvg) algorithm, <inline-formula><tex-math>$\\mathcal {O}(1/T)$</tex-math></inline-formula>, with the sub-optimality gap enlarged by a constant factor related to the model splitting design in <monospace>FedPMT</monospace>. Empirical results show that <monospace>FedPMT</monospace> significantly outperforms the existing partial model training designs, FedDrop and HeteroFL, especially on complex tasks. Meanwhile, compared to the popular model-homogeneous benchmark, FedAvg, <monospace>FedPMT</monospace> reaches the learning target in a shorter completion time, thus achieving a better trade-off between learning accuracy and completion time.","PeriodicalId":54229,"journal":{"name":"IEEE Transactions on Network Science and Engineering","volume":"12 6","pages":"4922-4938"},"PeriodicalIF":7.9000,"publicationDate":"2025-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Network Science and Engineering","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/11027799/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, MULTIDISCIPLINARY","Score":null,"Total":0}

引用次数: 0

Abstract

Federated Learning (FL) enables many resource-limited devices to train a model collaboratively without data sharing. However, many existing works focus on model-homogeneous FL, where the global and local models are the same size, ignoring the inherently heterogeneous computational capabilities of different devices and restricting resource-constrained devices from contributing to FL. In this paper, we consider model-heterogeneous FL and propose Federated Partial Model Training (FedPMT), where devices with smaller computational capabilities work on partial models (subsets of the global model) and contribute to the global model. Different from Dropout-based partial model generations, which remove neurons in (hidden) model layers at random, model training in FedPMT is achieved from the back-propagation perspective. As such, all devices in FedPMT prioritize the most crucial parts of the global model. Theoretical analysis shows that the proposed partial model training design has a similar convergence rate to the widely adopted Federated Averaging (FedAvg) algorithm,

$\mathcal {O}(1/T)$

, with the sub-optimality gap enlarged by a constant factor related to the model splitting design in FedPMT. Empirical results show that FedPMT significantly outperforms the existing partial model training designs, FedDrop and HeteroFL, especially on complex tasks. Meanwhile, compared to the popular model-homogeneous benchmark, FedAvg, FedPMT reaches the learning target in a shorter completion time, thus achieving a better trade-off between learning accuracy and completion time.

查看原文本刊更多论文

离散-弹性联邦学习：用分层部分模型训练解决移动边缘网络的计算异质性

联邦学习（FL）使许多资源有限的设备能够在不共享数据的情况下协作训练模型。然而，许多现有的工作都集中在模型同质的FL上，其中全局和局部模型大小相同，忽略了不同设备固有的异构计算能力，限制了资源受限设备对FL的贡献。在本文中，我们考虑模型异构FL并提出联邦部分模型训练（FedPMT）。具有较小计算能力的设备在局部模型（全局模型的子集）上工作，并对全局模型做出贡献。与基于dropout的部分模型世代随机去除（隐藏）模型层中的神经元不同，FedPMT中的模型训练是从反向传播的角度实现的。因此，FedPMT中的所有设备优先考虑全球模型中最关键的部分。理论分析表明，所提出的部分模型训练设计与广泛采用的联邦平均（fedag）算法$\mathcal {O}(1/T)$具有相似的收敛速度，但次最优性差距被FedPMT中与模型分割设计相关的常数因子放大。实证结果表明，FedPMT显著优于现有的部分模型训练设计、FedDrop和HeteroFL，特别是在复杂任务上。同时，与流行的模型同构基准fedag相比，FedPMT在更短的完成时间内达到了学习目标，从而在学习精度和完成时间之间实现了更好的权衡。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Network Science and Engineering Engineering-Control and Systems Engineering

CiteScore

12.60

自引率

9.10%

发文量

393

期刊介绍： The proposed journal, called the IEEE Transactions on Network Science and Engineering (TNSE), is committed to timely publishing of peer-reviewed technical articles that deal with the theory and applications of network science and the interconnections among the elements in a system that form a network. In particular, the IEEE Transactions on Network Science and Engineering publishes articles on understanding, prediction, and control of structures and behaviors of networks at the fundamental level. The types of networks covered include physical or engineered networks, information networks, biological networks, semantic networks, economic networks, social networks, and ecological networks. Aimed at discovering common principles that govern network structures, network functionalities and behaviors of networks, the journal seeks articles on understanding, prediction, and control of structures and behaviors of networks. Another trans-disciplinary focus of the IEEE Transactions on Network Science and Engineering is the interactions between and co-evolution of different genres of networks.