{"title":"SDW-DPC:一种利用标准差加权距离搜索密度峰值的高级聚类算法","authors":"Juanying Xie, Xingli Liu, Mingzhao Wang, Wenjie Zhang","doi":"10.1109/CCIS57298.2022.10016309","DOIUrl":null,"url":null,"abstract":"DPC (clustering by fast search and find of density peaks) algorithm is an ingenious and efficient clustering algorithm that can discover cluster centers via its very novelty decision graph and subsequently achieve the clustering of a dataset efficiently via its innovative one-step assignment strategy. However, DPC algorithm has its inborn shortcomings, such as its “Domino Effect” that once a point is assigned to an error cluster, then there will be many subsequent points being assigned to error clusters, resulting in a poor clustering. This shortcoming in part due to its one-step assignment, and in part due to its distance metric. DPC uses the Euclidean distance to calculate the distance between points. The Euclidean distance takes it as default that each feature does equal contribute to the distance between points, but in practice, each feature does not make equal contribute to the distance. To address this shortcoming, this paper proposes a standard deviation weighted distance instead of the Euclidean distance used in DPC algorithm. This innovative distance weights a feature using the standard deviation of the feature on all points from a dataset, so that the distance between points embodies the specific contribution of the feature to the distance. The very efficient one-step assignment strategy is inherited. Therefore, we developed the advanced PDC clustering algorithm which is referred to as SDW-DPC (Standard Deviation Weighted Distance based Density Peaks Clustering) algorithm. Extensive experiments on synthetic datasets and real-world datasets from UCI machine learning repository demonstrate that our SDW-DPC outperforms the original DPC, and other famous benchmark clustering algorithms including AP, DBSCAN and K-means in terms of clustering accuracy (Acc), adjusted mutual information (AMI), and adjusted rand index (ARI).","PeriodicalId":374660,"journal":{"name":"2022 IEEE 8th International Conference on Cloud Computing and Intelligent Systems (CCIS)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"SDW-DPC: An Advanced Clustering Algorithm by Searching Density Peaks using Standard Deviation Weighted Distance\",\"authors\":\"Juanying Xie, Xingli Liu, Mingzhao Wang, Wenjie Zhang\",\"doi\":\"10.1109/CCIS57298.2022.10016309\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"DPC (clustering by fast search and find of density peaks) algorithm is an ingenious and efficient clustering algorithm that can discover cluster centers via its very novelty decision graph and subsequently achieve the clustering of a dataset efficiently via its innovative one-step assignment strategy. However, DPC algorithm has its inborn shortcomings, such as its “Domino Effect” that once a point is assigned to an error cluster, then there will be many subsequent points being assigned to error clusters, resulting in a poor clustering. This shortcoming in part due to its one-step assignment, and in part due to its distance metric. DPC uses the Euclidean distance to calculate the distance between points. The Euclidean distance takes it as default that each feature does equal contribute to the distance between points, but in practice, each feature does not make equal contribute to the distance. To address this shortcoming, this paper proposes a standard deviation weighted distance instead of the Euclidean distance used in DPC algorithm. This innovative distance weights a feature using the standard deviation of the feature on all points from a dataset, so that the distance between points embodies the specific contribution of the feature to the distance. The very efficient one-step assignment strategy is inherited. Therefore, we developed the advanced PDC clustering algorithm which is referred to as SDW-DPC (Standard Deviation Weighted Distance based Density Peaks Clustering) algorithm. Extensive experiments on synthetic datasets and real-world datasets from UCI machine learning repository demonstrate that our SDW-DPC outperforms the original DPC, and other famous benchmark clustering algorithms including AP, DBSCAN and K-means in terms of clustering accuracy (Acc), adjusted mutual information (AMI), and adjusted rand index (ARI).\",\"PeriodicalId\":374660,\"journal\":{\"name\":\"2022 IEEE 8th International Conference on Cloud Computing and Intelligent Systems (CCIS)\",\"volume\":\"26 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-11-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE 8th International Conference on Cloud Computing and Intelligent Systems (CCIS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CCIS57298.2022.10016309\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE 8th International Conference on Cloud Computing and Intelligent Systems (CCIS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CCIS57298.2022.10016309","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
摘要
DPC (clustering by fast search and find of density peaks)算法是一种巧妙而高效的聚类算法,它可以通过其新颖的决策图发现聚类中心,然后通过其创新的一步分配策略高效地实现数据集的聚类。然而,DPC算法也有其固有的缺点,比如它的“多米诺效应”,一旦一个点被分配到一个错误的聚类中,就会有很多后续的点被分配到错误的聚类中,导致聚类效果很差。这种缺点部分是由于它的一步分配,部分是由于它的距离度量。DPC使用欧几里得距离来计算点之间的距离。欧几里得距离默认每个特征对点间距离的贡献是相等的,但在实际中,每个特征对距离的贡献是不相等的。为了解决这一缺点,本文提出了一种标准差加权距离来代替DPC算法中使用的欧氏距离。这种创新的距离使用特征在数据集中所有点上的标准差来对特征进行加权,这样点之间的距离就体现了特征对距离的具体贡献。继承了非常有效的一步分配策略。为此,我们开发了一种先进的PDC聚类算法,称为SDW-DPC (Standard Deviation Weighted Distance based Density Peaks clustering)算法。在UCI机器学习存储库的合成数据集和真实数据集上进行的大量实验表明,我们的SDW-DPC在聚类精度(Acc)、调整互信息(AMI)和调整rand指数(ARI)方面优于原始的DPC以及其他著名的基准聚类算法,包括AP、DBSCAN和K-means。
SDW-DPC: An Advanced Clustering Algorithm by Searching Density Peaks using Standard Deviation Weighted Distance
DPC (clustering by fast search and find of density peaks) algorithm is an ingenious and efficient clustering algorithm that can discover cluster centers via its very novelty decision graph and subsequently achieve the clustering of a dataset efficiently via its innovative one-step assignment strategy. However, DPC algorithm has its inborn shortcomings, such as its “Domino Effect” that once a point is assigned to an error cluster, then there will be many subsequent points being assigned to error clusters, resulting in a poor clustering. This shortcoming in part due to its one-step assignment, and in part due to its distance metric. DPC uses the Euclidean distance to calculate the distance between points. The Euclidean distance takes it as default that each feature does equal contribute to the distance between points, but in practice, each feature does not make equal contribute to the distance. To address this shortcoming, this paper proposes a standard deviation weighted distance instead of the Euclidean distance used in DPC algorithm. This innovative distance weights a feature using the standard deviation of the feature on all points from a dataset, so that the distance between points embodies the specific contribution of the feature to the distance. The very efficient one-step assignment strategy is inherited. Therefore, we developed the advanced PDC clustering algorithm which is referred to as SDW-DPC (Standard Deviation Weighted Distance based Density Peaks Clustering) algorithm. Extensive experiments on synthetic datasets and real-world datasets from UCI machine learning repository demonstrate that our SDW-DPC outperforms the original DPC, and other famous benchmark clustering algorithms including AP, DBSCAN and K-means in terms of clustering accuracy (Acc), adjusted mutual information (AMI), and adjusted rand index (ARI).