{"title":"Hierarchical Agglomerative Clustering Using Common Neighbours Similarity","authors":"M. Makrehchi","doi":"10.1109/WI.2016.0093","DOIUrl":null,"url":null,"abstract":"Hierarchical clustering has been well-studied in the community of machine learning. Hierarchical clustering algorithms are deterministic, stable, and do not need a pre-determined number of clusters as input. However, they are not scalable for very large data due to their non-linear complexity. In this paper, a new approach is proposed to reduce the complexity of Hierarchical Clustering, improve the purity of the clustering algorithm, and reduce the chaining factor. The proposed method has the following components: (i) A new combination similarity based on common-neighbours of graph theory is proposed, (ii) In every iteration, instead of calculating the centroids for new clusters, new centroids are estimated from centroids in previous iteration, and (iii) In each iteration, instead of merging only one pair of objects, multiple pairs are merged at the same time. In addition to the proposed combination similarity, four well-known methods including centroid-based, group-based, complete-link, and single-link, have been also implemented. All five methods are tested and evaluated using two metrics: purity and imbalance or chaining factor. We show that our proposed algorithm outperforms other classic methods.","PeriodicalId":6513,"journal":{"name":"2016 IEEE/WIC/ACM International Conference on Web Intelligence (WI)","volume":"48 1","pages":"546-551"},"PeriodicalIF":0.0000,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE/WIC/ACM International Conference on Web Intelligence (WI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/WI.2016.0093","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
Hierarchical clustering has been well-studied in the community of machine learning. Hierarchical clustering algorithms are deterministic, stable, and do not need a pre-determined number of clusters as input. However, they are not scalable for very large data due to their non-linear complexity. In this paper, a new approach is proposed to reduce the complexity of Hierarchical Clustering, improve the purity of the clustering algorithm, and reduce the chaining factor. The proposed method has the following components: (i) A new combination similarity based on common-neighbours of graph theory is proposed, (ii) In every iteration, instead of calculating the centroids for new clusters, new centroids are estimated from centroids in previous iteration, and (iii) In each iteration, instead of merging only one pair of objects, multiple pairs are merged at the same time. In addition to the proposed combination similarity, four well-known methods including centroid-based, group-based, complete-link, and single-link, have been also implemented. All five methods are tested and evaluated using two metrics: purity and imbalance or chaining factor. We show that our proposed algorithm outperforms other classic methods.