Accelerating Communication-Efficient Federated Multi-Task Learning With Personalization and Fairness

IF 5.6 2区计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS

IEEE Transactions on Parallel and Distributed Systems Pub Date : 2024-06-10 DOI:10.1109/TPDS.2024.3411815

Renyou Xie;Chaojie Li;Xiaojun Zhou;Zhaoyang Dong

{"title":"Accelerating Communication-Efficient Federated Multi-Task Learning With Personalization and Fairness","authors":"Renyou Xie;Chaojie Li;Xiaojun Zhou;Zhaoyang Dong","doi":"10.1109/TPDS.2024.3411815","DOIUrl":null,"url":null,"abstract":"Federated learning techniques provide a promising framework for collaboratively training a machine learning model without sharing users’ data, and delivering a security solution to guarantee privacy during the model training of IoT devices. Nonetheless, challenges posed by data heterogeneity and communication resource constraints make it difficult to develop an efficient federated learning algorithm in terms of the low order of convergence rate. It could significantly deteriorate the quality of service for critical machine learning tasks, e.g., facial recognition, which requires an edge-ready, low-power, low-latency training algorithm. To address these challenges, a communication-efficient federated learning approach is proposed in this paper where the momentum technique is leveraged to accelerate the convergence rate while largely reducing the communication requirements. First, a federated multi-task learning framework by which the learning tasks are reformulated by the multi-objective optimization problem is introduced to address the data heterogeneity. The multiple gradient descent algorithm is harnessed to find the common gradient descending direction for all participants so that the common features can be learned and no sacrifice on each clients’ performance. Second, to reduce communication costs, a local momentum technique with global information is developed to speed up the convergence rate, where the convergence analysis of the proposed method under non-convex case is studied. It is proved that the proposed local momentum can actually achieve the same acceleration as the global momentum, whereas it is more robust than algorithms that solely rely on the acceleration by the global momentum. Third, the generalization of the proposed acceleration approach is investigated which is demonstrated by the accelerated variation of FedAvg. Finally, the performance of the proposed method on the learning model accuracy, convergence rate, and robustness to data heterogeneity, is investigated by empirical experiments on four public datasets, while a real-world IoT platform is constructed to demonstrate the communication efficiency of the proposed method.","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":null,"pages":null},"PeriodicalIF":5.6000,"publicationDate":"2024-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Parallel and Distributed Systems","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10552428/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}

引用次数: 0

Abstract

Federated learning techniques provide a promising framework for collaboratively training a machine learning model without sharing users’ data, and delivering a security solution to guarantee privacy during the model training of IoT devices. Nonetheless, challenges posed by data heterogeneity and communication resource constraints make it difficult to develop an efficient federated learning algorithm in terms of the low order of convergence rate. It could significantly deteriorate the quality of service for critical machine learning tasks, e.g., facial recognition, which requires an edge-ready, low-power, low-latency training algorithm. To address these challenges, a communication-efficient federated learning approach is proposed in this paper where the momentum technique is leveraged to accelerate the convergence rate while largely reducing the communication requirements. First, a federated multi-task learning framework by which the learning tasks are reformulated by the multi-objective optimization problem is introduced to address the data heterogeneity. The multiple gradient descent algorithm is harnessed to find the common gradient descending direction for all participants so that the common features can be learned and no sacrifice on each clients’ performance. Second, to reduce communication costs, a local momentum technique with global information is developed to speed up the convergence rate, where the convergence analysis of the proposed method under non-convex case is studied. It is proved that the proposed local momentum can actually achieve the same acceleration as the global momentum, whereas it is more robust than algorithms that solely rely on the acceleration by the global momentum. Third, the generalization of the proposed acceleration approach is investigated which is demonstrated by the accelerated variation of FedAvg. Finally, the performance of the proposed method on the learning model accuracy, convergence rate, and robustness to data heterogeneity, is investigated by empirical experiments on four public datasets, while a real-world IoT platform is constructed to demonstrate the communication efficiency of the proposed method.

查看原文本刊更多论文

以个性化和公平性加速具有通信效率的联合多任务学习

联盟学习技术提供了一个前景广阔的框架，可在不共享用户数据的情况下协作训练机器学习模型，并提供安全解决方案，在物联网设备的模型训练过程中保证隐私。然而，由于数据异构性和通信资源限制带来的挑战，很难开发出高效的联合学习算法（收敛率较低）。这可能会大大降低关键机器学习任务（如面部识别）的服务质量，而面部识别需要边缘就绪、低功耗、低延迟的训练算法。为了应对这些挑战，本文提出了一种通信效率高的联合学习方法，利用动量技术加快收敛速度，同时大大降低通信要求。首先，本文引入了联合多任务学习框架，通过多目标优化问题对学习任务进行重新表述，以解决数据异质性问题。利用多重梯度下降算法，为所有参与者找到共同的梯度下降方向，这样既能学习到共同特征，又不会牺牲每个客户端的性能。其次，为了降低通信成本，开发了一种具有全局信息的局部动量技术，以加快收敛速度，研究了所提方法在非凸情况下的收敛分析。研究证明，所提出的局部动量实际上可以达到与全局动量相同的加速度，而与单纯依赖全局动量加速的算法相比，它的鲁棒性更高。第三，研究了所提加速方法的通用性，并通过 FedAvg 的加速变化进行了证明。最后，通过在四个公共数据集上进行实证实验，研究了所提方法在学习模型准确性、收敛速度和对数据异质性的鲁棒性方面的性能，同时构建了一个真实世界的物联网平台，以证明所提方法的通信效率。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Parallel and Distributed Systems 工程技术-工程：电子与电气

CiteScore

11.00

自引率

9.40%

发文量

281

审稿时长

5.6 months

期刊介绍： IEEE Transactions on Parallel and Distributed Systems (TPDS) is published monthly. It publishes a range of papers, comments on previously published papers, and survey articles that deal with the parallel and distributed systems research areas of current importance to our readers. Particular areas of interest include, but are not limited to: a) Parallel and distributed algorithms, focusing on topics such as: models of computation; numerical, combinatorial, and data-intensive parallel algorithms, scalability of algorithms and data structures for parallel and distributed systems, communication and synchronization protocols, network algorithms, scheduling, and load balancing. b) Applications of parallel and distributed computing, including computational and data-enabled science and engineering, big data applications, parallel crowd sourcing, large-scale social network analysis, management of big data, cloud and grid computing, scientific and biomedical applications, mobile computing, and cyber-physical systems. c) Parallel and distributed architectures, including architectures for instruction-level and thread-level parallelism; design, analysis, implementation, fault resilience and performance measurements of multiple-processor systems; multicore processors, heterogeneous many-core systems; petascale and exascale systems designs; novel big data architectures; special purpose architectures, including graphics processors, signal processors, network processors, media accelerators, and other special purpose processors and accelerators; impact of technology on architecture; network and interconnect architectures; parallel I/O and storage systems; architecture of the memory hierarchy; power-efficient and green computing architectures; dependable architectures; and performance modeling and evaluation. d) Parallel and distributed software, including parallel and multicore programming languages and compilers, runtime systems, operating systems, Internet computing and web services, resource management including green computing, middleware for grids, clouds, and data centers, libraries, performance modeling and evaluation, parallel programming paradigms, and programming environments and tools.