Federated k-means based on clusters backbone.

IF 2.9 3区综合性期刊 Q1 MULTIDISCIPLINARY SCIENCES

PLoS ONE Pub Date : 2025-06-12 eCollection Date: 2025-01-01 DOI:10.1371/journal.pone.0326145

Zilong Deng, Yizhang Wang, Mustafa Muwafak Alobaedy

{"title":"Federated k-means based on clusters backbone.","authors":"Zilong Deng, Yizhang Wang, Mustafa Muwafak Alobaedy","doi":"10.1371/journal.pone.0326145","DOIUrl":null,"url":null,"abstract":"<p><p>Federated clustering is a distributed clustering algorithm that does not require the transmission of raw data and is widely used. However, it struggles to handle Non-IID data effectively because it is difficult to obtain accurate global consistency measures under Non-Independent and Identically Distributed (Non-IID) conditions. To address this issue, we propose a federated k-means clustering algorithm based on a cluster backbone called FKmeansCB. First, we add Laplace noise to all the local data, and run k-means clustering on the client side to obtain cluster centers, which faithfully represent the cluster backbone (i.e., the data structures of the clusters). The cluster backbone represents the client's features and can approximatively capture the features of different labeled data points in Non-IID situations. We then upload these cluster centers to the server. Subsequently, the server aggregates all cluster centers and runs the k-means clustering algorithm to obtain global cluster centers, which are then sent back to the client. Finally, the client assigns all data points to the nearest global cluster center to produce the final clustering results. We have validated the performance of our proposed algorithm using six datasets, including the large-scale MNIST dataset. Compared with the leading non-federated and federated clustering algorithms, FKmeansCB offers significant advantages in both clustering accuracy and running time.</p>","PeriodicalId":20189,"journal":{"name":"PLoS ONE","volume":"20 6","pages":"e0326145"},"PeriodicalIF":2.9000,"publicationDate":"2025-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12161523/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"PLoS ONE","FirstCategoryId":"103","ListUrlMain":"https://doi.org/10.1371/journal.pone.0326145","RegionNum":3,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q1","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}

引用次数: 0

Abstract

Federated clustering is a distributed clustering algorithm that does not require the transmission of raw data and is widely used. However, it struggles to handle Non-IID data effectively because it is difficult to obtain accurate global consistency measures under Non-Independent and Identically Distributed (Non-IID) conditions. To address this issue, we propose a federated k-means clustering algorithm based on a cluster backbone called FKmeansCB. First, we add Laplace noise to all the local data, and run k-means clustering on the client side to obtain cluster centers, which faithfully represent the cluster backbone (i.e., the data structures of the clusters). The cluster backbone represents the client's features and can approximatively capture the features of different labeled data points in Non-IID situations. We then upload these cluster centers to the server. Subsequently, the server aggregates all cluster centers and runs the k-means clustering algorithm to obtain global cluster centers, which are then sent back to the client. Finally, the client assigns all data points to the nearest global cluster center to produce the final clustering results. We have validated the performance of our proposed algorithm using six datasets, including the large-scale MNIST dataset. Compared with the leading non-federated and federated clustering algorithms, FKmeansCB offers significant advantages in both clustering accuracy and running time.

查看原文本刊更多论文

基于集群主干的联邦k-means。

联邦聚类是一种不需要传输原始数据的分布式聚类算法，被广泛使用。然而，它很难有效地处理非iid数据，因为在非独立和同分布（Non-IID）条件下难以获得准确的全局一致性度量。为了解决这个问题，我们提出了一种基于集群骨干FKmeansCB的联邦k-均值聚类算法。首先，我们对所有的局部数据加入拉普拉斯噪声，并在客户端运行k-means聚类，得到忠实地代表集群主干（即集群的数据结构）的集群中心。集群主干代表客户端的特征，可以近似地捕获非iid情况下不同标记数据点的特征。然后我们将这些集群中心上传到服务器。随后，服务器聚合所有集群中心并运行k-means聚类算法以获得全局集群中心，然后将其发送回客户端。最后，客户端将所有数据点分配给最近的全局集群中心，以产生最终的集群结果。我们使用六个数据集验证了我们提出的算法的性能，包括大规模的MNIST数据集。与目前领先的非联邦和联邦聚类算法相比，FKmeansCB在聚类精度和运行时间上都有显著的优势。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

PLoS ONE 生物-生物学

CiteScore

6.20

自引率

5.40%

发文量

14242

审稿时长

3.7 months

期刊介绍： PLOS ONE is an international, peer-reviewed, open-access, online publication. PLOS ONE welcomes reports on primary research from any scientific discipline. It provides: * Open-access—freely accessible online, authors retain copyright * Fast publication times * Peer review by expert, practicing researchers * Post-publication tools to indicate quality and impact * Community-based dialogue on articles * Worldwide media coverage