分层聚类算法中的聚类合并与分裂

2002 IEEE International Conference on Data Mining, 2002. Proceedings. Pub Date : 2002-12-09 DOI:10.1109/ICDM.2002.1183896

C. Ding, Xiaofeng He

{"title":"分层聚类算法中的聚类合并与分裂","authors":"C. Ding, Xiaofeng He","doi":"10.1109/ICDM.2002.1183896","DOIUrl":null,"url":null,"abstract":"Hierarchical clustering constructs a hierarchy of clusters by either repeatedly merging two smaller clusters into a larger one or splitting a larger cluster into smaller ones. The crucial step is how to best select the next cluster(s) to split or merge. We provide a comprehensive analysis of selection methods and propose several new methods. We perform extensive clustering experiments to test 8 selection methods, and find that the average similarity is the best method in divisive clustering and the minmax linkage is the best in agglomerative clustering. Cluster balance is a key factor to achieve good performance. We also introduce the concept of objective function saturation and clustering target distance to effectively assess the quality of clustering.","PeriodicalId":405340,"journal":{"name":"2002 IEEE International Conference on Data Mining, 2002. Proceedings.","volume":"44 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2002-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"183","resultStr":"{\"title\":\"Cluster merging and splitting in hierarchical clustering algorithms\",\"authors\":\"C. Ding, Xiaofeng He\",\"doi\":\"10.1109/ICDM.2002.1183896\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Hierarchical clustering constructs a hierarchy of clusters by either repeatedly merging two smaller clusters into a larger one or splitting a larger cluster into smaller ones. The crucial step is how to best select the next cluster(s) to split or merge. We provide a comprehensive analysis of selection methods and propose several new methods. We perform extensive clustering experiments to test 8 selection methods, and find that the average similarity is the best method in divisive clustering and the minmax linkage is the best in agglomerative clustering. Cluster balance is a key factor to achieve good performance. We also introduce the concept of objective function saturation and clustering target distance to effectively assess the quality of clustering.\",\"PeriodicalId\":405340,\"journal\":{\"name\":\"2002 IEEE International Conference on Data Mining, 2002. Proceedings.\",\"volume\":\"44 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2002-12-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"183\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2002 IEEE International Conference on Data Mining, 2002. Proceedings.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICDM.2002.1183896\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2002 IEEE International Conference on Data Mining, 2002. Proceedings.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDM.2002.1183896","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 183

摘要

分层聚类通过将两个较小的集群重复合并为较大的集群或将较大的集群拆分为较小的集群来构建集群的层次结构。关键的一步是如何最好地选择要拆分或合并的下一个集群。我们对选择方法进行了全面的分析，并提出了几种新的方法。我们对8种选择方法进行了广泛的聚类实验，发现平均相似度是分裂聚类的最佳选择方法，最小最大连接是聚集聚类的最佳选择方法。集群平衡是实现良好性能的关键因素。引入目标函数饱和度和聚类目标距离的概念，有效地评价聚类质量。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Cluster merging and splitting in hierarchical clustering algorithms

Hierarchical clustering constructs a hierarchy of clusters by either repeatedly merging two smaller clusters into a larger one or splitting a larger cluster into smaller ones. The crucial step is how to best select the next cluster(s) to split or merge. We provide a comprehensive analysis of selection methods and propose several new methods. We perform extensive clustering experiments to test 8 selection methods, and find that the average similarity is the best method in divisive clustering and the minmax linkage is the best in agglomerative clustering. Cluster balance is a key factor to achieve good performance. We also introduce the concept of objective function saturation and clustering target distance to effectively assess the quality of clustering.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2002 IEEE International Conference on Data Mining, 2002. Proceedings.

自引率

0.00%

发文量