Training Latency Minimization for Model-Splitting Allowed Federated Edge Learning

IF 6.7 2区计算机科学 Q1 ENGINEERING, MULTIDISCIPLINARY

IEEE Transactions on Network Science and Engineering Pub Date : 2025-02-20 DOI:10.1109/TNSE.2025.3544313

Yao Wen;Guopeng Zhang;Kezhi Wang;Kun Yang

{"title":"Training Latency Minimization for Model-Splitting Allowed Federated Edge Learning","authors":"Yao Wen;Guopeng Zhang;Kezhi Wang;Kun Yang","doi":"10.1109/TNSE.2025.3544313","DOIUrl":null,"url":null,"abstract":"To alleviate the shortage of computing power faced by clients in training deep neural networks (DNNs) using federated learning (FL), we leverage the <italic>edge computing and <italic>split learning to propose a model-splitting allowed FL (<monospace>SFL</monospace>) framework, with the aim to minimize the training latency without loss of test accuracy. Under the <italic>synchronized global update setting, the latency to complete a round of global training is determined by the maximum latency for the clients to complete a local training session. Therefore, the training latency minimization problem (TLMP) is modelled as a minimizing-maximum problem. To solve this mixed integer nonlinear programming problem, we first propose a <italic>regression method to fit the quantitative-relationship between the <italic>cut-layer and other parameters of an AI-model, and thus, transform the TLMP into a continuous problem. Considering that the two subproblems involved in the TLMP, namely, the <italic>cut-layer selection problem for the clients and the <italic>computing resource allocation problem for the parameter-server are relative independence, an alternate-optimization-based algorithm with polynomial time complexity is developed to obtain a high-quality solution to the TLMP. Extensive experiments are performed on a popular DNN-model <italic>EfficientNetV2 using dataset MNIST, and the results verify the validity and improved performance of the proposed <monospace>SFL</monospace> framework.","PeriodicalId":54229,"journal":{"name":"IEEE Transactions on Network Science and Engineering","volume":"12 3","pages":"2081-2092"},"PeriodicalIF":6.7000,"publicationDate":"2025-02-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Network Science and Engineering","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10897845/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, MULTIDISCIPLINARY","Score":null,"Total":0}

引用次数: 0

Abstract

To alleviate the shortage of computing power faced by clients in training deep neural networks (DNNs) using federated learning (FL), we leverage the edge computing and split learning to propose a model-splitting allowed FL (SFL) framework, with the aim to minimize the training latency without loss of test accuracy. Under the synchronized global update setting, the latency to complete a round of global training is determined by the maximum latency for the clients to complete a local training session. Therefore, the training latency minimization problem (TLMP) is modelled as a minimizing-maximum problem. To solve this mixed integer nonlinear programming problem, we first propose a regression method to fit the quantitative-relationship between the cut-layer and other parameters of an AI-model, and thus, transform the TLMP into a continuous problem. Considering that the two subproblems involved in the TLMP, namely, the cut-layer selection problem for the clients and the computing resource allocation problem for the parameter-server are relative independence, an alternate-optimization-based algorithm with polynomial time complexity is developed to obtain a high-quality solution to the TLMP. Extensive experiments are performed on a popular DNN-model EfficientNetV2 using dataset MNIST, and the results verify the validity and improved performance of the proposed SFL framework.

查看原文本刊更多论文

允许联邦边缘学习的模型分割训练延迟最小化

为了缓解客户在使用联邦学习（FL）训练深度神经网络（dnn）时面临的计算能力不足的问题，我们利用边缘计算和分裂学习提出了一个允许模型分裂的FL （SFL）框架，目的是在不损失测试准确性的情况下最小化训练延迟。在同步全局更新设置下，完成一轮全局训练的延迟时间由客户端完成一个局部训练会话的最大延迟时间决定。因此，训练延迟最小化问题（TLMP）被建模为最小化-最大化问题。为了解决这一混合整数非线性规划问题，我们首先提出了一种回归方法来拟合ai模型的切割层与其他参数之间的定量关系，从而将TLMP转化为一个连续问题。考虑到TLMP中涉及的两个子问题，即客户端的切割层选择问题和参数服务器的计算资源分配问题是相对独立的，为了获得TLMP的高质量解，提出了一种时间复杂度为多项式的基于交替优化的算法。利用MNIST数据集在流行的dnn模型EfficientNetV2上进行了大量实验，结果验证了所提出的SFL框架的有效性和改进的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Network Science and Engineering Engineering-Control and Systems Engineering

CiteScore

12.60

自引率

9.10%

发文量

393

期刊介绍： The proposed journal, called the IEEE Transactions on Network Science and Engineering (TNSE), is committed to timely publishing of peer-reviewed technical articles that deal with the theory and applications of network science and the interconnections among the elements in a system that form a network. In particular, the IEEE Transactions on Network Science and Engineering publishes articles on understanding, prediction, and control of structures and behaviors of networks at the fundamental level. The types of networks covered include physical or engineered networks, information networks, biological networks, semantic networks, economic networks, social networks, and ecological networks. Aimed at discovering common principles that govern network structures, network functionalities and behaviors of networks, the journal seeks articles on understanding, prediction, and control of structures and behaviors of networks. Another trans-disciplinary focus of the IEEE Transactions on Network Science and Engineering is the interactions between and co-evolution of different genres of networks.