Heterogeneity-Aware Clustering and Intra-Cluster Uniform Data Sampling for Federated Learning

IF 5.3 3区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE Transactions on Emerging Topics in Computational Intelligence Pub Date : 2024-12-25 DOI:10.1109/TETCI.2024.3515007

Jian Chen;Peifeng Zhang;Jiahui Chen;Terry Shue Chien Lau

{"title":"Heterogeneity-Aware Clustering and Intra-Cluster Uniform Data Sampling for Federated Learning","authors":"Jian Chen;Peifeng Zhang;Jiahui Chen;Terry Shue Chien Lau","doi":"10.1109/TETCI.2024.3515007","DOIUrl":null,"url":null,"abstract":"Federated learning (FL) is an innovative privacy-preserving machine learning paradigm that enables clients to train a global model without sharing their local data. However, the coexistence of category distribution heterogeneity and quantity imbalance frequently occurs in real-world FL scenarios. On the one side, due to the category distribution heterogeneity, local models are optimized based on distinct local objectives, resulting in divergent optimization directions. On the other side, quantity imbalance in widely used uniform client sampling of FL may hinder the active participation of clients with larger datasets in model training, and potentially make the model get suboptimal performance. To tackle this, we propose a framework that incorporates heterogeneity-aware clustering and intra-cluster uniform data sampling. More precisely, we firstly do heterogeneity-aware clustering that performs clustering on clients based on category distribution vectors. Then, we implement intra-cluster uniform data sampling, where local data from each client within a cluster is randomly selected based on a predetermined probability. Furthermore, to address privacy concerns, we incorporate homomorphic encryption to protect clients' category distribution vectors and sample sizes. Finally, the experimental results on multiple benchmark datasets demonstrate that the proposed framework validate the superiority of our approach.","PeriodicalId":13135,"journal":{"name":"IEEE Transactions on Emerging Topics in Computational Intelligence","volume":"9 3","pages":"2545-2556"},"PeriodicalIF":5.3000,"publicationDate":"2024-12-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Emerging Topics in Computational Intelligence","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10815593/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Federated learning (FL) is an innovative privacy-preserving machine learning paradigm that enables clients to train a global model without sharing their local data. However, the coexistence of category distribution heterogeneity and quantity imbalance frequently occurs in real-world FL scenarios. On the one side, due to the category distribution heterogeneity, local models are optimized based on distinct local objectives, resulting in divergent optimization directions. On the other side, quantity imbalance in widely used uniform client sampling of FL may hinder the active participation of clients with larger datasets in model training, and potentially make the model get suboptimal performance. To tackle this, we propose a framework that incorporates heterogeneity-aware clustering and intra-cluster uniform data sampling. More precisely, we firstly do heterogeneity-aware clustering that performs clustering on clients based on category distribution vectors. Then, we implement intra-cluster uniform data sampling, where local data from each client within a cluster is randomly selected based on a predetermined probability. Furthermore, to address privacy concerns, we incorporate homomorphic encryption to protect clients' category distribution vectors and sample sizes. Finally, the experimental results on multiple benchmark datasets demonstrate that the proposed framework validate the superiority of our approach.

查看原文本刊更多论文

面向联邦学习的异构感知聚类和簇内统一数据采样

联邦学习（FL）是一种创新的保护隐私的机器学习范式，使客户能够在不共享本地数据的情况下训练全局模型。然而，在现实的FL场景中，经常出现类别分布异质性和数量不平衡并存的情况。一方面，由于品类分布的异质性，局部模型基于不同的局部目标进行优化，导致优化方向不一致。另一方面，广泛使用的统一客户端抽样的数量不平衡可能会阻碍具有更大数据集的客户端积极参与模型训练，并可能使模型获得次优性能。为了解决这个问题，我们提出了一个结合异构感知聚类和簇内统一数据采样的框架。更准确地说，我们首先进行基于类别分布向量的异构感知聚类，在客户端上执行聚类。然后，我们实现了集群内均匀数据采样，其中基于预先确定的概率随机选择集群内每个客户端的本地数据。此外，为了解决隐私问题，我们采用同态加密来保护客户的类别分布向量和样本量。最后，在多个基准数据集上的实验结果表明，该框架验证了该方法的优越性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Emerging Topics in Computational Intelligence Mathematics-Control and Optimization

CiteScore

10.30

自引率

7.50%

发文量

147

期刊介绍： The IEEE Transactions on Emerging Topics in Computational Intelligence (TETCI) publishes original articles on emerging aspects of computational intelligence, including theory, applications, and surveys. TETCI is an electronics only publication. TETCI publishes six issues per year. Authors are encouraged to submit manuscripts in any emerging topic in computational intelligence, especially nature-inspired computing topics not covered by other IEEE Computational Intelligence Society journals. A few such illustrative examples are glial cell networks, computational neuroscience, Brain Computer Interface, ambient intelligence, non-fuzzy computing with words, artificial life, cultural learning, artificial endocrine networks, social reasoning, artificial hormone networks, computational intelligence for the IoT and Smart-X technologies.