{"title":"CURE-NS: a hierarchical clustering algorithm with new shrinking scheme","authors":"Y. Qian, Qingzhang Shi, Qi Wang","doi":"10.1109/ICMLC.2002.1174512","DOIUrl":null,"url":null,"abstract":"CURE (clustering using representatives) is an efficient clustering algorithm for large databases, which is more robust to outliers compared with other clustering methods, and identifies clusters having non-spherical shapes and wide variances in size. CURE employs a fixed number or representative points to describe the cluster, and the set of representative points are first chosen randomly, and then are shrunk toward the mean of cluster. The shrinking operation plays a key role in CURE, which is used for weakening the effect of outliers. However, we found that the shrinking scheme of CURE is dependent on a hidden assumption of spherical shape of cluster, therefore CURE has difficulties in dealing with databases having specific shapes. In this paper, CURE-NS (CURE with new shrinking scheme) is proposed to overcome this problem, which uses the difference of density values of the representative points to determine the direction and distance of shrinking. Our shrinking scheme has nothing to do with the shape of cluster. A range of experiments demonstrate that CURE-NS has better clustering performance than CURE.","PeriodicalId":90702,"journal":{"name":"Proceedings. International Conference on Machine Learning and Cybernetics","volume":"12 1","pages":"895-899 vol.2"},"PeriodicalIF":0.0000,"publicationDate":"2002-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"15","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings. International Conference on Machine Learning and Cybernetics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICMLC.2002.1174512","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 15
Abstract
CURE (clustering using representatives) is an efficient clustering algorithm for large databases, which is more robust to outliers compared with other clustering methods, and identifies clusters having non-spherical shapes and wide variances in size. CURE employs a fixed number or representative points to describe the cluster, and the set of representative points are first chosen randomly, and then are shrunk toward the mean of cluster. The shrinking operation plays a key role in CURE, which is used for weakening the effect of outliers. However, we found that the shrinking scheme of CURE is dependent on a hidden assumption of spherical shape of cluster, therefore CURE has difficulties in dealing with databases having specific shapes. In this paper, CURE-NS (CURE with new shrinking scheme) is proposed to overcome this problem, which uses the difference of density values of the representative points to determine the direction and distance of shrinking. Our shrinking scheme has nothing to do with the shape of cluster. A range of experiments demonstrate that CURE-NS has better clustering performance than CURE.
CURE (clustering using representative)是一种高效的大型数据库聚类算法,与其他聚类方法相比,它对离群值的鲁棒性更强,能够识别形状非球形且大小差异较大的聚类。CURE采用固定数量或代表性点来描述聚类,首先随机选取代表性点集,然后向聚类均值缩小。缩小操作在CURE中起着关键作用,它用于减弱离群值的影响。然而,我们发现CURE的收缩方案依赖于一个隐藏的假设,即簇的球形,因此CURE在处理具有特定形状的数据库时存在困难。为了克服这一问题,本文提出了CURE- ns (CURE with new shrink scheme)算法,利用代表点的密度值差异来确定收缩的方向和距离。我们的收缩方案与星团的形状无关。一系列实验表明,CURE- ns具有比CURE更好的聚类性能。