CGKDFL:基于客户端聚类和基于生成器的异构数据知识蒸馏的联合学习方法

IF 1.5 4区 计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING
Sanfeng Zhang, Hongzhen Xu, Xiaojun Yu
{"title":"CGKDFL:基于客户端聚类和基于生成器的异构数据知识蒸馏的联合学习方法","authors":"Sanfeng Zhang,&nbsp;Hongzhen Xu,&nbsp;Xiaojun Yu","doi":"10.1002/cpe.70048","DOIUrl":null,"url":null,"abstract":"<div>\n \n <p>In practical, real-world complex networks, data distribution is frequently decentralized and Non-Independently Identically Distributed (Non-IID). This heterogeneous data presents a significant challenge for federated learning. Such problems include the generation of biased global models, the lack of sufficient personalization capability of local models, and the difficulty in absorbing global knowledge. We propose a Federated Learning Approach Based on Client Clustering and Generator-based Knowledge Distillation(CGKDFL) for heterogeneous data. Firstly, to reduce the global model bias, we propose a clustering federated learning approach that only requires each client to transmit some of the parameters of the selected layer, thus reducing the number of parameters. Subsequently, to circumvent the absence of global knowledge resulting from clustering, a generator designed to improve privacy features and increase diversity is developed on the server side. This generator produces feature representation data that aligns with the specific tasks of the client by utilizing the labeling information provided by the client. This is achieved without the need for any external dataset. The generator then transfers its global knowledge to the local model. The client can then utilize this information for knowledge distillation. Finally, extensive experiments were conducted on three heterogeneous datasets. The results demonstrate that CGKDFL outperforms the baseline method by a minimum of <span></span><math>\n <semantics>\n <mrow>\n <mn>7</mn>\n <mo>.</mo>\n <mn>24</mn>\n <mo>%</mo>\n </mrow>\n <annotation>$$ 7.24\\% $$</annotation>\n </semantics></math>, <span></span><math>\n <semantics>\n <mrow>\n <mn>6</mn>\n <mo>.</mo>\n <mn>73</mn>\n <mo>%</mo>\n </mrow>\n <annotation>$$ 6.73\\% $$</annotation>\n </semantics></math>, and <span></span><math>\n <semantics>\n <mrow>\n <mn>3</mn>\n <mo>.</mo>\n <mn>13</mn>\n <mo>%</mo>\n </mrow>\n <annotation>$$ 3.13\\% $$</annotation>\n </semantics></math> regarding accuracy on the three heterogeneous datasets. Additionally, it outperforms the compared methods regarding convergence speed in all cases.</p>\n </div>","PeriodicalId":55214,"journal":{"name":"Concurrency and Computation-Practice & Experience","volume":"37 9-11","pages":""},"PeriodicalIF":1.5000,"publicationDate":"2025-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"CGKDFL: A Federated Learning Approach Based on Client Clustering and Generator-Based Knowledge Distillation for Heterogeneous Data\",\"authors\":\"Sanfeng Zhang,&nbsp;Hongzhen Xu,&nbsp;Xiaojun Yu\",\"doi\":\"10.1002/cpe.70048\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div>\\n \\n <p>In practical, real-world complex networks, data distribution is frequently decentralized and Non-Independently Identically Distributed (Non-IID). This heterogeneous data presents a significant challenge for federated learning. Such problems include the generation of biased global models, the lack of sufficient personalization capability of local models, and the difficulty in absorbing global knowledge. We propose a Federated Learning Approach Based on Client Clustering and Generator-based Knowledge Distillation(CGKDFL) for heterogeneous data. Firstly, to reduce the global model bias, we propose a clustering federated learning approach that only requires each client to transmit some of the parameters of the selected layer, thus reducing the number of parameters. Subsequently, to circumvent the absence of global knowledge resulting from clustering, a generator designed to improve privacy features and increase diversity is developed on the server side. This generator produces feature representation data that aligns with the specific tasks of the client by utilizing the labeling information provided by the client. This is achieved without the need for any external dataset. The generator then transfers its global knowledge to the local model. The client can then utilize this information for knowledge distillation. Finally, extensive experiments were conducted on three heterogeneous datasets. The results demonstrate that CGKDFL outperforms the baseline method by a minimum of <span></span><math>\\n <semantics>\\n <mrow>\\n <mn>7</mn>\\n <mo>.</mo>\\n <mn>24</mn>\\n <mo>%</mo>\\n </mrow>\\n <annotation>$$ 7.24\\\\% $$</annotation>\\n </semantics></math>, <span></span><math>\\n <semantics>\\n <mrow>\\n <mn>6</mn>\\n <mo>.</mo>\\n <mn>73</mn>\\n <mo>%</mo>\\n </mrow>\\n <annotation>$$ 6.73\\\\% $$</annotation>\\n </semantics></math>, and <span></span><math>\\n <semantics>\\n <mrow>\\n <mn>3</mn>\\n <mo>.</mo>\\n <mn>13</mn>\\n <mo>%</mo>\\n </mrow>\\n <annotation>$$ 3.13\\\\% $$</annotation>\\n </semantics></math> regarding accuracy on the three heterogeneous datasets. Additionally, it outperforms the compared methods regarding convergence speed in all cases.</p>\\n </div>\",\"PeriodicalId\":55214,\"journal\":{\"name\":\"Concurrency and Computation-Practice & Experience\",\"volume\":\"37 9-11\",\"pages\":\"\"},\"PeriodicalIF\":1.5000,\"publicationDate\":\"2025-04-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Concurrency and Computation-Practice & Experience\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1002/cpe.70048\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, SOFTWARE ENGINEERING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Concurrency and Computation-Practice & Experience","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/cpe.70048","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}
引用次数: 0

摘要

在现实世界的复杂网络中,数据分布通常是分散的和非独立相同分布的(Non-IID)。这种异构数据对联邦学习提出了重大挑战。这些问题包括产生有偏差的全局模型,局部模型缺乏足够的个性化能力,以及难以吸收全局知识。提出了一种基于客户端聚类和基于生成器的知识蒸馏(CGKDFL)的异构数据联邦学习方法。首先,为了减少全局模型偏差,我们提出了一种聚类联邦学习方法,该方法只要求每个客户端传输所选层的一些参数,从而减少了参数的数量。随后,为了规避聚类导致的全局知识缺失,在服务器端开发了一个旨在改善隐私特征和增加多样性的生成器。该生成器通过利用客户端提供的标签信息,生成与客户端特定任务一致的特征表示数据。这在不需要任何外部数据集的情况下实现。然后,生成器将其全局知识转移到本地模型。然后,客户可以利用这些信息进行知识提炼。最后,在三个异构数据集上进行了大量实验。结果表明,CGKDFL比基线方法至少高出7分。24 % $$ 7.24\% $$ , 6 . 73 % $$ 6.73\% $$ , and 3 . 13 % $$ 3.13\% $$ regarding accuracy on the three heterogeneous datasets. Additionally, it outperforms the compared methods regarding convergence speed in all cases.
本文章由计算机程序翻译,如有差异,请以英文原文为准。
CGKDFL: A Federated Learning Approach Based on Client Clustering and Generator-Based Knowledge Distillation for Heterogeneous Data

In practical, real-world complex networks, data distribution is frequently decentralized and Non-Independently Identically Distributed (Non-IID). This heterogeneous data presents a significant challenge for federated learning. Such problems include the generation of biased global models, the lack of sufficient personalization capability of local models, and the difficulty in absorbing global knowledge. We propose a Federated Learning Approach Based on Client Clustering and Generator-based Knowledge Distillation(CGKDFL) for heterogeneous data. Firstly, to reduce the global model bias, we propose a clustering federated learning approach that only requires each client to transmit some of the parameters of the selected layer, thus reducing the number of parameters. Subsequently, to circumvent the absence of global knowledge resulting from clustering, a generator designed to improve privacy features and increase diversity is developed on the server side. This generator produces feature representation data that aligns with the specific tasks of the client by utilizing the labeling information provided by the client. This is achieved without the need for any external dataset. The generator then transfers its global knowledge to the local model. The client can then utilize this information for knowledge distillation. Finally, extensive experiments were conducted on three heterogeneous datasets. The results demonstrate that CGKDFL outperforms the baseline method by a minimum of 7 . 24 % $$ 7.24\% $$ , 6 . 73 % $$ 6.73\% $$ , and 3 . 13 % $$ 3.13\% $$ regarding accuracy on the three heterogeneous datasets. Additionally, it outperforms the compared methods regarding convergence speed in all cases.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Concurrency and Computation-Practice & Experience
Concurrency and Computation-Practice & Experience 工程技术-计算机:理论方法
CiteScore
5.00
自引率
10.00%
发文量
664
审稿时长
9.6 months
期刊介绍: Concurrency and Computation: Practice and Experience (CCPE) publishes high-quality, original research papers, and authoritative research review papers, in the overlapping fields of: Parallel and distributed computing; High-performance computing; Computational and data science; Artificial intelligence and machine learning; Big data applications, algorithms, and systems; Network science; Ontologies and semantics; Security and privacy; Cloud/edge/fog computing; Green computing; and Quantum computing.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信