A Two-Stage Clustering Algorithm Based on Improved K-Means and Density Peak Clustering

2019 IEEE International Conference on Big Knowledge (ICBK) Pub Date : 2019-11-01 DOI:10.1109/ICBK.2019.00047

Na Xiao, Xu Zhou, Xin Huang, Zhibang Yang

{"title":"A Two-Stage Clustering Algorithm Based on Improved K-Means and Density Peak Clustering","authors":"Na Xiao, Xu Zhou, Xin Huang, Zhibang Yang","doi":"10.1109/ICBK.2019.00047","DOIUrl":null,"url":null,"abstract":"The density peak clustering algorithm (DPC) has been widely concerned by researchers since it was proposed. Its advantage lies in its ability to achieve efficient clustering based on two simple assumptions. In DPC, a key step is to manually select the cluster centers according to the decision graph. The quality of the decision graph determines the quality of the selected cluster centers and the quality of the clustering result. The quality of the decision graph is determined by the parameter dc. Although the authors have proposed an empirical parameter selection method, this method does not work well in many real-world datasets. Therefore, in these data sets, the user needs to repeatedly adjust the parameter multiple times to get a good decision graph. Thus, manually selecting cluster centers is not an easy task. In this paper, combined with the clustering idea of K-means and DPC, we propose a two-stage clustering algorithm KDPC that can automatically acquire the cluster centers. In the first stage, KDPC uses an improved K-means algorithm to obtain high quality cluster centers. In the second stage, KDPC clusters the remaining data points according to the clustering idea of DPC. Experiments show that KDPC can achieve good clustering effect in both artificial data sets and real-world data sets. In addition, compared with DPC, KDPC can show better clustering effect in data sets with significant difference in density of clusters.","PeriodicalId":383917,"journal":{"name":"2019 IEEE International Conference on Big Knowledge (ICBK)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE International Conference on Big Knowledge (ICBK)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICBK.2019.00047","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

Abstract

The density peak clustering algorithm (DPC) has been widely concerned by researchers since it was proposed. Its advantage lies in its ability to achieve efficient clustering based on two simple assumptions. In DPC, a key step is to manually select the cluster centers according to the decision graph. The quality of the decision graph determines the quality of the selected cluster centers and the quality of the clustering result. The quality of the decision graph is determined by the parameter dc. Although the authors have proposed an empirical parameter selection method, this method does not work well in many real-world datasets. Therefore, in these data sets, the user needs to repeatedly adjust the parameter multiple times to get a good decision graph. Thus, manually selecting cluster centers is not an easy task. In this paper, combined with the clustering idea of K-means and DPC, we propose a two-stage clustering algorithm KDPC that can automatically acquire the cluster centers. In the first stage, KDPC uses an improved K-means algorithm to obtain high quality cluster centers. In the second stage, KDPC clusters the remaining data points according to the clustering idea of DPC. Experiments show that KDPC can achieve good clustering effect in both artificial data sets and real-world data sets. In addition, compared with DPC, KDPC can show better clustering effect in data sets with significant difference in density of clusters.

查看原文本刊更多论文

基于改进K-Means和密度峰聚类的两阶段聚类算法

密度峰聚类算法(DPC)自提出以来，一直受到研究人员的广泛关注。它的优势在于它能够基于两个简单的假设实现有效的聚类。在DPC中，一个关键步骤是根据决策图手动选择聚类中心。决策图的质量决定了所选聚类中心的质量和聚类结果的质量。决策图的质量由参数dc决定。虽然作者提出了一种经验参数选择方法，但这种方法在许多实际数据集中并不适用。因此，在这些数据集中，用户需要多次反复调整参数，才能得到一个好的决策图。因此，手动选择集群中心不是一件容易的事。本文结合K-means和DPC的聚类思想，提出了一种自动获取聚类中心的两阶段聚类算法KDPC。在第一阶段，KDPC使用改进的K-means算法获得高质量的聚类中心。在第二阶段，KDPC根据DPC的聚类思想对剩余的数据点进行聚类。实验表明，KDPC在人工数据集和真实数据集上都能取得良好的聚类效果。此外，与DPC相比，KDPC在聚类密度差异显著的数据集上可以表现出更好的聚类效果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2019 IEEE International Conference on Big Knowledge (ICBK)

自引率

0.00%

发文量