Hierarchical Agglomerative Clustering Using Common Neighbours Similarity

M. Makrehchi
{"title":"Hierarchical Agglomerative Clustering Using Common Neighbours Similarity","authors":"M. Makrehchi","doi":"10.1109/WI.2016.0093","DOIUrl":null,"url":null,"abstract":"Hierarchical clustering has been well-studied in the community of machine learning. Hierarchical clustering algorithms are deterministic, stable, and do not need a pre-determined number of clusters as input. However, they are not scalable for very large data due to their non-linear complexity. In this paper, a new approach is proposed to reduce the complexity of Hierarchical Clustering, improve the purity of the clustering algorithm, and reduce the chaining factor. The proposed method has the following components: (i) A new combination similarity based on common-neighbours of graph theory is proposed, (ii) In every iteration, instead of calculating the centroids for new clusters, new centroids are estimated from centroids in previous iteration, and (iii) In each iteration, instead of merging only one pair of objects, multiple pairs are merged at the same time. In addition to the proposed combination similarity, four well-known methods including centroid-based, group-based, complete-link, and single-link, have been also implemented. All five methods are tested and evaluated using two metrics: purity and imbalance or chaining factor. We show that our proposed algorithm outperforms other classic methods.","PeriodicalId":6513,"journal":{"name":"2016 IEEE/WIC/ACM International Conference on Web Intelligence (WI)","volume":"48 1","pages":"546-551"},"PeriodicalIF":0.0000,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE/WIC/ACM International Conference on Web Intelligence (WI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/WI.2016.0093","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

Abstract

Hierarchical clustering has been well-studied in the community of machine learning. Hierarchical clustering algorithms are deterministic, stable, and do not need a pre-determined number of clusters as input. However, they are not scalable for very large data due to their non-linear complexity. In this paper, a new approach is proposed to reduce the complexity of Hierarchical Clustering, improve the purity of the clustering algorithm, and reduce the chaining factor. The proposed method has the following components: (i) A new combination similarity based on common-neighbours of graph theory is proposed, (ii) In every iteration, instead of calculating the centroids for new clusters, new centroids are estimated from centroids in previous iteration, and (iii) In each iteration, instead of merging only one pair of objects, multiple pairs are merged at the same time. In addition to the proposed combination similarity, four well-known methods including centroid-based, group-based, complete-link, and single-link, have been also implemented. All five methods are tested and evaluated using two metrics: purity and imbalance or chaining factor. We show that our proposed algorithm outperforms other classic methods.
基于共同邻居相似性的层次聚类
层次聚类在机器学习领域得到了很好的研究。分层聚类算法是确定性的,稳定的,并且不需要预先确定数量的聚类作为输入。然而,由于它们的非线性复杂性,它们不能扩展到非常大的数据。本文提出了一种新的方法来降低分层聚类的复杂度,提高聚类算法的纯洁性,并减少连锁因子。该方法具有以下几个组成部分:(1)提出了一种基于图论的共同邻域的新的组合相似度;(2)在每次迭代中,不再计算新聚类的质心,而是从前一次迭代的质心估计新的质心;(3)在每次迭代中,不再只合并一对对象,而是同时合并多对对象。除了提出的组合相似度方法外,还实现了基于质心的、基于群的、完全链接的和单链接的四种著名的组合相似度方法。所有五种方法都使用两个指标进行测试和评估:纯度和不平衡或连锁因子。结果表明,本文提出的算法优于其他经典方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信