A Two-Stage Clustering Algorithm Based on Improved K-Means and Density Peak Clustering

Na Xiao, Xu Zhou, Xin Huang, Zhibang Yang
{"title":"A Two-Stage Clustering Algorithm Based on Improved K-Means and Density Peak Clustering","authors":"Na Xiao, Xu Zhou, Xin Huang, Zhibang Yang","doi":"10.1109/ICBK.2019.00047","DOIUrl":null,"url":null,"abstract":"The density peak clustering algorithm (DPC) has been widely concerned by researchers since it was proposed. Its advantage lies in its ability to achieve efficient clustering based on two simple assumptions. In DPC, a key step is to manually select the cluster centers according to the decision graph. The quality of the decision graph determines the quality of the selected cluster centers and the quality of the clustering result. The quality of the decision graph is determined by the parameter dc. Although the authors have proposed an empirical parameter selection method, this method does not work well in many real-world datasets. Therefore, in these data sets, the user needs to repeatedly adjust the parameter multiple times to get a good decision graph. Thus, manually selecting cluster centers is not an easy task. In this paper, combined with the clustering idea of K-means and DPC, we propose a two-stage clustering algorithm KDPC that can automatically acquire the cluster centers. In the first stage, KDPC uses an improved K-means algorithm to obtain high quality cluster centers. In the second stage, KDPC clusters the remaining data points according to the clustering idea of DPC. Experiments show that KDPC can achieve good clustering effect in both artificial data sets and real-world data sets. In addition, compared with DPC, KDPC can show better clustering effect in data sets with significant difference in density of clusters.","PeriodicalId":383917,"journal":{"name":"2019 IEEE International Conference on Big Knowledge (ICBK)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE International Conference on Big Knowledge (ICBK)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICBK.2019.00047","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

Abstract

The density peak clustering algorithm (DPC) has been widely concerned by researchers since it was proposed. Its advantage lies in its ability to achieve efficient clustering based on two simple assumptions. In DPC, a key step is to manually select the cluster centers according to the decision graph. The quality of the decision graph determines the quality of the selected cluster centers and the quality of the clustering result. The quality of the decision graph is determined by the parameter dc. Although the authors have proposed an empirical parameter selection method, this method does not work well in many real-world datasets. Therefore, in these data sets, the user needs to repeatedly adjust the parameter multiple times to get a good decision graph. Thus, manually selecting cluster centers is not an easy task. In this paper, combined with the clustering idea of K-means and DPC, we propose a two-stage clustering algorithm KDPC that can automatically acquire the cluster centers. In the first stage, KDPC uses an improved K-means algorithm to obtain high quality cluster centers. In the second stage, KDPC clusters the remaining data points according to the clustering idea of DPC. Experiments show that KDPC can achieve good clustering effect in both artificial data sets and real-world data sets. In addition, compared with DPC, KDPC can show better clustering effect in data sets with significant difference in density of clusters.
基于改进K-Means和密度峰聚类的两阶段聚类算法
密度峰聚类算法(DPC)自提出以来,一直受到研究人员的广泛关注。它的优势在于它能够基于两个简单的假设实现有效的聚类。在DPC中,一个关键步骤是根据决策图手动选择聚类中心。决策图的质量决定了所选聚类中心的质量和聚类结果的质量。决策图的质量由参数dc决定。虽然作者提出了一种经验参数选择方法,但这种方法在许多实际数据集中并不适用。因此,在这些数据集中,用户需要多次反复调整参数,才能得到一个好的决策图。因此,手动选择集群中心不是一件容易的事。本文结合K-means和DPC的聚类思想,提出了一种自动获取聚类中心的两阶段聚类算法KDPC。在第一阶段,KDPC使用改进的K-means算法获得高质量的聚类中心。在第二阶段,KDPC根据DPC的聚类思想对剩余的数据点进行聚类。实验表明,KDPC在人工数据集和真实数据集上都能取得良好的聚类效果。此外,与DPC相比,KDPC在聚类密度差异显著的数据集上可以表现出更好的聚类效果。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信