FedCSpc: A Cross-Silo Federated Learning System With Error-Bounded Lossy Parameter Compression

IF 5.6 2区计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS

IEEE Transactions on Parallel and Distributed Systems Pub Date : 2025-04-28 DOI:10.1109/TPDS.2025.3564736

Zhaorui Zhang;Sheng Di;Kai Zhao;Sian Jin;Dingwen Tao;Zhuoran Ji;Benben Liu;Khalid Ayed Alharthi;Jiannong Cao;Franck Cappello

{"title":"FedCSpc: A Cross-Silo Federated Learning System With Error-Bounded Lossy Parameter Compression","authors":"Zhaorui Zhang;Sheng Di;Kai Zhao;Sian Jin;Dingwen Tao;Zhuoran Ji;Benben Liu;Khalid Ayed Alharthi;Jiannong Cao;Franck Cappello","doi":"10.1109/TPDS.2025.3564736","DOIUrl":null,"url":null,"abstract":"Cross-Silo federated learning is widely used for scaling deep neural network (DNN) training over data silos from different locations worldwide while guaranteeing data privacy. Communication has been identified as the main bottleneck when training large-scale models due to large-volume model parameters and gradient transmission across public networks with limited bandwidth. Most previous works focus on gradient compression, while limited work tries to compress parameters that can not be ignored and extremely affect communication performance during the training. To bridge this gap, we propose <italic>FedCSpc:</i> an efficient cross-silo federated learning system with an XAI-driven adaptive parameter compression strategy for large-scale model training. Our work substantially differs from existing gradient compression techniques due to the distinct data features of gradient and parameter. The key contributions of this paper are fourfold. (1) Our designed <italic>FedCSpc</i> proposes to compress the parameter during the training using the state-of-the-art error-bounded lossy compressor – SZ3. (2) We develop an adaptive compression error bound adjustment algorithm to guarantee the model accuracy effectively. (3) We exploit an efficient approach to utilize the idle CPU resources of clients to compress the parameters. (4) We perform a comprehensive evaluation with a wide range of models and benchmarks on a GPU cluster with 65 GPUs. Results show that <italic>FedCSpc</i> can achieve the same model accuracy as FedAvg while reducing the data volume of parameters and gradients in communication by up to 7.39× and 288×, respectively. With 32 clients on a 4 Gb size model, <italic>FedCSpc</i> significantly outperforms FedAvg in wall-clock time in the emulated WAN environment (at the bandwidth of 1 Gbps or lower without loss of generality).","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":"36 7","pages":"1372-1386"},"PeriodicalIF":5.6000,"publicationDate":"2025-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Parallel and Distributed Systems","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10978107/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}

引用次数: 0

Abstract

Cross-Silo federated learning is widely used for scaling deep neural network (DNN) training over data silos from different locations worldwide while guaranteeing data privacy. Communication has been identified as the main bottleneck when training large-scale models due to large-volume model parameters and gradient transmission across public networks with limited bandwidth. Most previous works focus on gradient compression, while limited work tries to compress parameters that can not be ignored and extremely affect communication performance during the training. To bridge this gap, we propose FedCSpc: an efficient cross-silo federated learning system with an XAI-driven adaptive parameter compression strategy for large-scale model training. Our work substantially differs from existing gradient compression techniques due to the distinct data features of gradient and parameter. The key contributions of this paper are fourfold. (1) Our designed FedCSpc proposes to compress the parameter during the training using the state-of-the-art error-bounded lossy compressor – SZ3. (2) We develop an adaptive compression error bound adjustment algorithm to guarantee the model accuracy effectively. (3) We exploit an efficient approach to utilize the idle CPU resources of clients to compress the parameters. (4) We perform a comprehensive evaluation with a wide range of models and benchmarks on a GPU cluster with 65 GPUs. Results show that FedCSpc can achieve the same model accuracy as FedAvg while reducing the data volume of parameters and gradients in communication by up to 7.39× and 288×, respectively. With 32 clients on a 4 Gb size model, FedCSpc significantly outperforms FedAvg in wall-clock time in the emulated WAN environment (at the bandwidth of 1 Gbps or lower without loss of generality).

查看原文本刊更多论文

FedCSpc：具有误差有界有损参数压缩的跨筒仓联邦学习系统

跨筒仓联邦学习广泛用于在全球不同位置的数据筒仓上扩展深度神经网络（DNN）训练，同时保证数据隐私。由于大量的模型参数和在有限带宽的公共网络上的梯度传输，通信被认为是训练大规模模型的主要瓶颈。以往的工作大多集中在梯度压缩上，而有限的工作试图压缩训练过程中不可忽视的、对通信性能影响极大的参数。为了弥补这一差距，我们提出了FedCSpc：一个高效的跨竖井联邦学习系统，具有xai驱动的自适应参数压缩策略，用于大规模模型训练。由于梯度和参数的数据特征不同，我们的工作与现有的梯度压缩技术有很大的不同。本文的主要贡献有四个方面。(1)我们设计的FedCSpc提出在训练过程中使用最先进的误差有界有损压缩器SZ3压缩参数。(2)开发了一种自适应压缩误差界调整算法，有效地保证了模型的精度。(3)利用客户端空闲的CPU资源对参数进行压缩。(4)我们在具有65个GPU的GPU集群上使用广泛的模型和基准进行了全面的评估。结果表明，FedCSpc可以达到与FedAvg相同的模型精度，同时将通信中参数和梯度的数据量分别减少了7.39倍和288倍。在4 Gb大小的模型上使用32个客户端，FedCSpc在模拟WAN环境中的时钟时间上明显优于FedAvg（在带宽为1 Gbps或更低的情况下，不会失去通用性）。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Parallel and Distributed Systems 工程技术-工程：电子与电气

CiteScore

11.00

自引率

9.40%

发文量

281

审稿时长

5.6 months

期刊介绍： IEEE Transactions on Parallel and Distributed Systems (TPDS) is published monthly. It publishes a range of papers, comments on previously published papers, and survey articles that deal with the parallel and distributed systems research areas of current importance to our readers. Particular areas of interest include, but are not limited to: a) Parallel and distributed algorithms, focusing on topics such as: models of computation; numerical, combinatorial, and data-intensive parallel algorithms, scalability of algorithms and data structures for parallel and distributed systems, communication and synchronization protocols, network algorithms, scheduling, and load balancing. b) Applications of parallel and distributed computing, including computational and data-enabled science and engineering, big data applications, parallel crowd sourcing, large-scale social network analysis, management of big data, cloud and grid computing, scientific and biomedical applications, mobile computing, and cyber-physical systems. c) Parallel and distributed architectures, including architectures for instruction-level and thread-level parallelism; design, analysis, implementation, fault resilience and performance measurements of multiple-processor systems; multicore processors, heterogeneous many-core systems; petascale and exascale systems designs; novel big data architectures; special purpose architectures, including graphics processors, signal processors, network processors, media accelerators, and other special purpose processors and accelerators; impact of technology on architecture; network and interconnect architectures; parallel I/O and storage systems; architecture of the memory hierarchy; power-efficient and green computing architectures; dependable architectures; and performance modeling and evaluation. d) Parallel and distributed software, including parallel and multicore programming languages and compilers, runtime systems, operating systems, Internet computing and web services, resource management including green computing, middleware for grids, clouds, and data centers, libraries, performance modeling and evaluation, parallel programming paradigms, and programming environments and tools.