Density Peaks Clustering Based on Label Propagation and K-Mutual-Nearest Neighbors

IF 5.3 3区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE Transactions on Emerging Topics in Computational Intelligence Pub Date : 2024-09-09 DOI:10.1109/TETCI.2024.3452687

Liping Sun;Fan Huang;Xiaoyao Zheng;Liangmin Guo;Qingying Yu;Zhenghua Chen;Yonglong Luo

{"title":"Density Peaks Clustering Based on Label Propagation and K-Mutual-Nearest Neighbors","authors":"Liping Sun;Fan Huang;Xiaoyao Zheng;Liangmin Guo;Qingying Yu;Zhenghua Chen;Yonglong Luo","doi":"10.1109/TETCI.2024.3452687","DOIUrl":null,"url":null,"abstract":"The density peaks clustering algorithm is one of the density-based clustering algorithms. This algorithm has several advantages, including not requiring a preset number of clusters, requiring fewer parameters, and being able to achieve clustering of any shape. However, it also has limitations, such as poor clustering performance on datasets with uneven density, the need to manually select cluster centers on the decision graph, and a chain reaction that can lead to a large number of point misallocations due to incorrect allocation of individual points. To overcome the shortcomings of the density peaks clustering algorithm, we propose a density peaks clustering algorithm based on label propagation and k-mutual-nearest neighbors. First, the local density and the distance are defined by incorporating the concept of k-mutual-nearest neighbors to enhance clustering performance on datasets with uneven-density clusters. Second, an adaptive method for selecting cluster centers is proposed to avoid the manual selection of cluster centers. Third, an improved label propagation algorithm is used to assign all remaining points to solve the chain reaction problem. The experimental results show that our algorithm can accurately identify cluster centers and obtain high-quality clustering results on synthetic datasets with different characteristics, including datasets with uneven cluster density, convex datasets, manifold datasets, and datasets with inter-cluster contact. On different types of UCI datasets, including small datasets, high-dimensional datasets, and large datasets, our algorithm outperforms other comparative algorithms.","PeriodicalId":13135,"journal":{"name":"IEEE Transactions on Emerging Topics in Computational Intelligence","volume":"9 2","pages":"1830-1842"},"PeriodicalIF":5.3000,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Emerging Topics in Computational Intelligence","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10670066/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

The density peaks clustering algorithm is one of the density-based clustering algorithms. This algorithm has several advantages, including not requiring a preset number of clusters, requiring fewer parameters, and being able to achieve clustering of any shape. However, it also has limitations, such as poor clustering performance on datasets with uneven density, the need to manually select cluster centers on the decision graph, and a chain reaction that can lead to a large number of point misallocations due to incorrect allocation of individual points. To overcome the shortcomings of the density peaks clustering algorithm, we propose a density peaks clustering algorithm based on label propagation and k-mutual-nearest neighbors. First, the local density and the distance are defined by incorporating the concept of k-mutual-nearest neighbors to enhance clustering performance on datasets with uneven-density clusters. Second, an adaptive method for selecting cluster centers is proposed to avoid the manual selection of cluster centers. Third, an improved label propagation algorithm is used to assign all remaining points to solve the chain reaction problem. The experimental results show that our algorithm can accurately identify cluster centers and obtain high-quality clustering results on synthetic datasets with different characteristics, including datasets with uneven cluster density, convex datasets, manifold datasets, and datasets with inter-cluster contact. On different types of UCI datasets, including small datasets, high-dimensional datasets, and large datasets, our algorithm outperforms other comparative algorithms.

查看原文本刊更多论文

基于标签传播和k近邻的密度峰聚类

密度峰聚类算法是一种基于密度的聚类算法。该算法有几个优点，包括不需要预设数量的聚类，需要较少的参数，并且能够实现任何形状的聚类。然而，它也有局限性，例如在密度不均匀的数据集上聚类性能差，需要在决策图上手动选择聚类中心，以及由于单个点的不正确分配而导致大量点错配的连锁反应。为了克服密度峰聚类算法的不足，提出了一种基于标签传播和k近邻的密度峰聚类算法。首先，结合k近邻概念定义局部密度和距离，提高非均匀密度数据集的聚类性能；其次，提出了一种自适应聚类中心选择方法，避免了人工选择聚类中心的问题。第三，采用改进的标签传播算法对所有剩余点进行分配，解决链式反应问题。实验结果表明，该算法可以准确识别聚类中心，并在不同特征的合成数据集上获得高质量的聚类结果，包括聚类密度不均匀的数据集、凸形数据集、流形数据集以及聚类间接触的数据集。在不同类型的UCI数据集上，包括小数据集、高维数据集和大数据集，我们的算法优于其他比较算法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Emerging Topics in Computational Intelligence Mathematics-Control and Optimization

CiteScore

10.30

自引率

7.50%

发文量

147

期刊介绍： The IEEE Transactions on Emerging Topics in Computational Intelligence (TETCI) publishes original articles on emerging aspects of computational intelligence, including theory, applications, and surveys. TETCI is an electronics only publication. TETCI publishes six issues per year. Authors are encouraged to submit manuscripts in any emerging topic in computational intelligence, especially nature-inspired computing topics not covered by other IEEE Computational Intelligence Society journals. A few such illustrative examples are glial cell networks, computational neuroscience, Brain Computer Interface, ambient intelligence, non-fuzzy computing with words, artificial life, cultural learning, artificial endocrine networks, social reasoning, artificial hormone networks, computational intelligence for the IoT and Smart-X technologies.