A new online clustering approach for data in arbitrary shaped clusters

2015 IEEE 2nd International Conference on Cybernetics (CYBCONF) Pub Date : 2015-06-01 DOI:10.1109/CYBConf.2015.7175937

Richard Hyde, P. Angelov

引用次数: 16

Abstract

In this paper we demonstrate a new density based clustering technique, CODSAS, for online clustering of streaming data into arbitrary shaped clusters. CODAS is a two stage process using a simple local density to initiate micro-clusters which are then combined into clusters. Memory efficiency is gained by not storing or re-using any data. Computational efficiency is gained by using hyper-spherical micro-clusters to achieve a micro-cluster joining technique that is dimensionally independent for speed. The micro-clusters divide the data space in to sub-spaces with a core region and a non-core region. Core regions which intersect define the clusters. A threshold value is used to identify outlier micro-clusters separately from small clusters of unusual data. The cluster information is fully maintained on-line. In this paper we compare CODAS with ELM, DEC, Chameleon, DBScan and Denstream and demonstrate that CODAS achieves comparable results but in a fully on-line and dimensionally scale-able manner.

查看原文本刊更多论文

一种新的任意形状聚类数据在线聚类方法

本文展示了一种新的基于密度的聚类技术，CODSAS，用于将流数据在线聚类成任意形状的聚类。CODAS是一个两阶段的过程，使用简单的局部密度来启动微集群，然后将微集群组合成集群。内存效率是通过不存储或重用任何数据获得的。利用超球面微团簇实现了与速度无关的微团簇连接技术，提高了计算效率。微集群将数据空间划分为具有核心区和非核心区的子空间。相交的核心区域定义了集群。阈值用于将异常数据的小簇与异常数据的小簇分开识别。集群信息完全在线维护。在本文中，我们将CODAS与ELM, DEC, Chameleon, DBScan和Denstream进行了比较，并证明CODAS在完全在线和维度可扩展的方式下取得了类似的结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2015 IEEE 2nd International Conference on Cybernetics (CYBCONF)

自引率

0.00%

发文量