Hierarchical Agglomerative Clustering Using Common Neighbours Similarity

2016 IEEE/WIC/ACM International Conference on Web Intelligence (WI) Pub Date : 2016-10-01 DOI:10.1109/WI.2016.0093

M. Makrehchi

{"title":"Hierarchical Agglomerative Clustering Using Common Neighbours Similarity","authors":"M. Makrehchi","doi":"10.1109/WI.2016.0093","DOIUrl":null,"url":null,"abstract":"Hierarchical clustering has been well-studied in the community of machine learning. Hierarchical clustering algorithms are deterministic, stable, and do not need a pre-determined number of clusters as input. However, they are not scalable for very large data due to their non-linear complexity. In this paper, a new approach is proposed to reduce the complexity of Hierarchical Clustering, improve the purity of the clustering algorithm, and reduce the chaining factor. The proposed method has the following components: (i) A new combination similarity based on common-neighbours of graph theory is proposed, (ii) In every iteration, instead of calculating the centroids for new clusters, new centroids are estimated from centroids in previous iteration, and (iii) In each iteration, instead of merging only one pair of objects, multiple pairs are merged at the same time. In addition to the proposed combination similarity, four well-known methods including centroid-based, group-based, complete-link, and single-link, have been also implemented. All five methods are tested and evaluated using two metrics: purity and imbalance or chaining factor. We show that our proposed algorithm outperforms other classic methods.","PeriodicalId":6513,"journal":{"name":"2016 IEEE/WIC/ACM International Conference on Web Intelligence (WI)","volume":"48 1","pages":"546-551"},"PeriodicalIF":0.0000,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE/WIC/ACM International Conference on Web Intelligence (WI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/WI.2016.0093","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

Abstract

Hierarchical clustering has been well-studied in the community of machine learning. Hierarchical clustering algorithms are deterministic, stable, and do not need a pre-determined number of clusters as input. However, they are not scalable for very large data due to their non-linear complexity. In this paper, a new approach is proposed to reduce the complexity of Hierarchical Clustering, improve the purity of the clustering algorithm, and reduce the chaining factor. The proposed method has the following components: (i) A new combination similarity based on common-neighbours of graph theory is proposed, (ii) In every iteration, instead of calculating the centroids for new clusters, new centroids are estimated from centroids in previous iteration, and (iii) In each iteration, instead of merging only one pair of objects, multiple pairs are merged at the same time. In addition to the proposed combination similarity, four well-known methods including centroid-based, group-based, complete-link, and single-link, have been also implemented. All five methods are tested and evaluated using two metrics: purity and imbalance or chaining factor. We show that our proposed algorithm outperforms other classic methods.

查看原文本刊更多论文

基于共同邻居相似性的层次聚类

层次聚类在机器学习领域得到了很好的研究。分层聚类算法是确定性的，稳定的，并且不需要预先确定数量的聚类作为输入。然而，由于它们的非线性复杂性，它们不能扩展到非常大的数据。本文提出了一种新的方法来降低分层聚类的复杂度，提高聚类算法的纯洁性，并减少连锁因子。该方法具有以下几个组成部分:(1)提出了一种基于图论的共同邻域的新的组合相似度;(2)在每次迭代中，不再计算新聚类的质心，而是从前一次迭代的质心估计新的质心;(3)在每次迭代中，不再只合并一对对象，而是同时合并多对对象。除了提出的组合相似度方法外，还实现了基于质心的、基于群的、完全链接的和单链接的四种著名的组合相似度方法。所有五种方法都使用两个指标进行测试和评估:纯度和不平衡或连锁因子。结果表明，本文提出的算法优于其他经典方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2016 IEEE/WIC/ACM International Conference on Web Intelligence (WI)

自引率

0.00%

发文量