Mei Cao , Huiyu Wang , Yuan Yuan , Jianbo Lu , Xiaojun Cai , Dongxiao Yu , Mengying Zhao
{"title":"FedMQ+:通过多粒度量化实现高效的异构联邦学习","authors":"Mei Cao , Huiyu Wang , Yuan Yuan , Jianbo Lu , Xiaojun Cai , Dongxiao Yu , Mengying Zhao","doi":"10.1016/j.sysarc.2025.103460","DOIUrl":null,"url":null,"abstract":"<div><div>Federated Learning (FL) is a distributed machine learning paradigm that enables collaborative model training while preserving client data privacy. Although this approach significantly enhances data privacy, it introduces substantial communication overhead. Quantization techniques mitigate this challenge by compressing model parameters into fewer bits. However, traditional quantization methods are primarily implemented at the client level, often overlooking the heterogeneous importance of distinct model parameters. In our previous work, FedMQ, we explored quantization mechanisms at both inter-client and intra-client levels to improve communication efficiency. Nevertheless, this approach compromises model accuracy due to the aggregation of models with limited expressive capability. Additionally, it lacks comprehensive theoretical analysis and mathematical verification. In this paper, we propose FedMQ+, an improved framework for heterogeneous federated learning with enhanced dequantization, to optimize global model performance. First, we design a precise dequantization strategy based on normal functions to accurately reconstruct full-precision weights from the given low-precision weights. Next, we conduct a rigorous theoretical analysis of the FedMQ+, establish an upper bound for its convergence, and mathematically demonstrate its <span><math><mrow><mi>O</mi><mrow><mo>(</mo><mn>1</mn><mo>/</mo><mi>T</mi><mo>)</mo></mrow></mrow></math></span> convergence rate. Finally, we perform extensive experiments across diverse datasets and models. Experimental results demonstrate that FedMQ+ achieves significant improvements in convergence speed from 3.1% to 85.8%, while maintaining comparable model accuracy and achieving superior communication efficiency compared with state-of-the-art baselines.</div></div>","PeriodicalId":50027,"journal":{"name":"Journal of Systems Architecture","volume":"167 ","pages":"Article 103460"},"PeriodicalIF":3.7000,"publicationDate":"2025-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"FedMQ+: Towards efficient heterogeneous federated learning with multi-grained quantization\",\"authors\":\"Mei Cao , Huiyu Wang , Yuan Yuan , Jianbo Lu , Xiaojun Cai , Dongxiao Yu , Mengying Zhao\",\"doi\":\"10.1016/j.sysarc.2025.103460\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Federated Learning (FL) is a distributed machine learning paradigm that enables collaborative model training while preserving client data privacy. Although this approach significantly enhances data privacy, it introduces substantial communication overhead. Quantization techniques mitigate this challenge by compressing model parameters into fewer bits. However, traditional quantization methods are primarily implemented at the client level, often overlooking the heterogeneous importance of distinct model parameters. In our previous work, FedMQ, we explored quantization mechanisms at both inter-client and intra-client levels to improve communication efficiency. Nevertheless, this approach compromises model accuracy due to the aggregation of models with limited expressive capability. Additionally, it lacks comprehensive theoretical analysis and mathematical verification. In this paper, we propose FedMQ+, an improved framework for heterogeneous federated learning with enhanced dequantization, to optimize global model performance. First, we design a precise dequantization strategy based on normal functions to accurately reconstruct full-precision weights from the given low-precision weights. Next, we conduct a rigorous theoretical analysis of the FedMQ+, establish an upper bound for its convergence, and mathematically demonstrate its <span><math><mrow><mi>O</mi><mrow><mo>(</mo><mn>1</mn><mo>/</mo><mi>T</mi><mo>)</mo></mrow></mrow></math></span> convergence rate. Finally, we perform extensive experiments across diverse datasets and models. Experimental results demonstrate that FedMQ+ achieves significant improvements in convergence speed from 3.1% to 85.8%, while maintaining comparable model accuracy and achieving superior communication efficiency compared with state-of-the-art baselines.</div></div>\",\"PeriodicalId\":50027,\"journal\":{\"name\":\"Journal of Systems Architecture\",\"volume\":\"167 \",\"pages\":\"Article 103460\"},\"PeriodicalIF\":3.7000,\"publicationDate\":\"2025-05-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Systems Architecture\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1383762125001328\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Systems Architecture","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1383762125001328","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
FedMQ+: Towards efficient heterogeneous federated learning with multi-grained quantization
Federated Learning (FL) is a distributed machine learning paradigm that enables collaborative model training while preserving client data privacy. Although this approach significantly enhances data privacy, it introduces substantial communication overhead. Quantization techniques mitigate this challenge by compressing model parameters into fewer bits. However, traditional quantization methods are primarily implemented at the client level, often overlooking the heterogeneous importance of distinct model parameters. In our previous work, FedMQ, we explored quantization mechanisms at both inter-client and intra-client levels to improve communication efficiency. Nevertheless, this approach compromises model accuracy due to the aggregation of models with limited expressive capability. Additionally, it lacks comprehensive theoretical analysis and mathematical verification. In this paper, we propose FedMQ+, an improved framework for heterogeneous federated learning with enhanced dequantization, to optimize global model performance. First, we design a precise dequantization strategy based on normal functions to accurately reconstruct full-precision weights from the given low-precision weights. Next, we conduct a rigorous theoretical analysis of the FedMQ+, establish an upper bound for its convergence, and mathematically demonstrate its convergence rate. Finally, we perform extensive experiments across diverse datasets and models. Experimental results demonstrate that FedMQ+ achieves significant improvements in convergence speed from 3.1% to 85.8%, while maintaining comparable model accuracy and achieving superior communication efficiency compared with state-of-the-art baselines.
期刊介绍:
The Journal of Systems Architecture: Embedded Software Design (JSA) is a journal covering all design and architectural aspects related to embedded systems and software. It ranges from the microarchitecture level via the system software level up to the application-specific architecture level. Aspects such as real-time systems, operating systems, FPGA programming, programming languages, communications (limited to analysis and the software stack), mobile systems, parallel and distributed architectures as well as additional subjects in the computer and system architecture area will fall within the scope of this journal. Technology will not be a main focus, but its use and relevance to particular designs will be. Case studies are welcome but must contribute more than just a design for a particular piece of software.
Design automation of such systems including methodologies, techniques and tools for their design as well as novel designs of software components fall within the scope of this journal. Novel applications that use embedded systems are also central in this journal. While hardware is not a part of this journal hardware/software co-design methods that consider interplay between software and hardware components with and emphasis on software are also relevant here.