{"title":"CGKDFL: A Federated Learning Approach Based on Client Clustering and Generator-Based Knowledge Distillation for Heterogeneous Data","authors":"Sanfeng Zhang, Hongzhen Xu, Xiaojun Yu","doi":"10.1002/cpe.70048","DOIUrl":null,"url":null,"abstract":"<div>\n \n <p>In practical, real-world complex networks, data distribution is frequently decentralized and Non-Independently Identically Distributed (Non-IID). This heterogeneous data presents a significant challenge for federated learning. Such problems include the generation of biased global models, the lack of sufficient personalization capability of local models, and the difficulty in absorbing global knowledge. We propose a Federated Learning Approach Based on Client Clustering and Generator-based Knowledge Distillation(CGKDFL) for heterogeneous data. Firstly, to reduce the global model bias, we propose a clustering federated learning approach that only requires each client to transmit some of the parameters of the selected layer, thus reducing the number of parameters. Subsequently, to circumvent the absence of global knowledge resulting from clustering, a generator designed to improve privacy features and increase diversity is developed on the server side. This generator produces feature representation data that aligns with the specific tasks of the client by utilizing the labeling information provided by the client. This is achieved without the need for any external dataset. The generator then transfers its global knowledge to the local model. The client can then utilize this information for knowledge distillation. Finally, extensive experiments were conducted on three heterogeneous datasets. The results demonstrate that CGKDFL outperforms the baseline method by a minimum of <span></span><math>\n <semantics>\n <mrow>\n <mn>7</mn>\n <mo>.</mo>\n <mn>24</mn>\n <mo>%</mo>\n </mrow>\n <annotation>$$ 7.24\\% $$</annotation>\n </semantics></math>, <span></span><math>\n <semantics>\n <mrow>\n <mn>6</mn>\n <mo>.</mo>\n <mn>73</mn>\n <mo>%</mo>\n </mrow>\n <annotation>$$ 6.73\\% $$</annotation>\n </semantics></math>, and <span></span><math>\n <semantics>\n <mrow>\n <mn>3</mn>\n <mo>.</mo>\n <mn>13</mn>\n <mo>%</mo>\n </mrow>\n <annotation>$$ 3.13\\% $$</annotation>\n </semantics></math> regarding accuracy on the three heterogeneous datasets. Additionally, it outperforms the compared methods regarding convergence speed in all cases.</p>\n </div>","PeriodicalId":55214,"journal":{"name":"Concurrency and Computation-Practice & Experience","volume":"37 9-11","pages":""},"PeriodicalIF":1.5000,"publicationDate":"2025-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Concurrency and Computation-Practice & Experience","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/cpe.70048","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}
引用次数: 0
Abstract
In practical, real-world complex networks, data distribution is frequently decentralized and Non-Independently Identically Distributed (Non-IID). This heterogeneous data presents a significant challenge for federated learning. Such problems include the generation of biased global models, the lack of sufficient personalization capability of local models, and the difficulty in absorbing global knowledge. We propose a Federated Learning Approach Based on Client Clustering and Generator-based Knowledge Distillation(CGKDFL) for heterogeneous data. Firstly, to reduce the global model bias, we propose a clustering federated learning approach that only requires each client to transmit some of the parameters of the selected layer, thus reducing the number of parameters. Subsequently, to circumvent the absence of global knowledge resulting from clustering, a generator designed to improve privacy features and increase diversity is developed on the server side. This generator produces feature representation data that aligns with the specific tasks of the client by utilizing the labeling information provided by the client. This is achieved without the need for any external dataset. The generator then transfers its global knowledge to the local model. The client can then utilize this information for knowledge distillation. Finally, extensive experiments were conducted on three heterogeneous datasets. The results demonstrate that CGKDFL outperforms the baseline method by a minimum of , , and regarding accuracy on the three heterogeneous datasets. Additionally, it outperforms the compared methods regarding convergence speed in all cases.
期刊介绍:
Concurrency and Computation: Practice and Experience (CCPE) publishes high-quality, original research papers, and authoritative research review papers, in the overlapping fields of:
Parallel and distributed computing;
High-performance computing;
Computational and data science;
Artificial intelligence and machine learning;
Big data applications, algorithms, and systems;
Network science;
Ontologies and semantics;
Security and privacy;
Cloud/edge/fog computing;
Green computing; and
Quantum computing.