Subspace anytime stream clustering

Scientific and statistical database management : International Conference, SSDBM ... : proceedings. International Conference on Scientific and Statistical Database Management Pub Date : 2014-06-30 DOI:10.1145/2618243.2618286

Marwan Hassani, P. Kranen, Rajveer Saini, T. Seidl

{"title":"Subspace anytime stream clustering","authors":"Marwan Hassani, P. Kranen, Rajveer Saini, T. Seidl","doi":"10.1145/2618243.2618286","DOIUrl":null,"url":null,"abstract":"Clustering of high dimensional streaming data is an emerging field of research. A real life data stream imposes many challenges on the clustering task, as an endless amount of data arrives constantly. A lot of research has been done in the full space stream clustering. To handle the varying speeds of the data stream, \"anytime\" algorithms are proposed but so far only in full space stream clustering. However, data streams from many application domains contain abundance of dimensions; the clusters often exist only in specific subspaces (subset of dimensions) and do not show up in the full feature space. In this paper, the first algorithm that considers both the high dimensionality and the varying speeds of streaming data, is proposed. The algorithm, called SubClusTree, can flexibly adapt to the different stream speeds and makes the best use of available time to provide a high quality subspace clustering. The experimental results prove the effectiveness of our anytime subspace concept.","PeriodicalId":74773,"journal":{"name":"Scientific and statistical database management : International Conference, SSDBM ... : proceedings. International Conference on Scientific and Statistical Database Management","volume":"50 1","pages":"37:1-37:4"},"PeriodicalIF":0.0000,"publicationDate":"2014-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"23","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Scientific and statistical database management : International Conference, SSDBM ... : proceedings. International Conference on Scientific and Statistical Database Management","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2618243.2618286","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 23

Abstract

Clustering of high dimensional streaming data is an emerging field of research. A real life data stream imposes many challenges on the clustering task, as an endless amount of data arrives constantly. A lot of research has been done in the full space stream clustering. To handle the varying speeds of the data stream, "anytime" algorithms are proposed but so far only in full space stream clustering. However, data streams from many application domains contain abundance of dimensions; the clusters often exist only in specific subspaces (subset of dimensions) and do not show up in the full feature space. In this paper, the first algorithm that considers both the high dimensionality and the varying speeds of streaming data, is proposed. The algorithm, called SubClusTree, can flexibly adapt to the different stream speeds and makes the best use of available time to provide a high quality subspace clustering. The experimental results prove the effectiveness of our anytime subspace concept.

查看原文本刊更多论文

子空间随时流聚类

高维流数据的聚类是一个新兴的研究领域。现实生活中的数据流给集群任务带来了许多挑战，因为不断有无穷无尽的数据到达。在全空间流聚类方面已经做了大量的研究。为了处理数据流的变化速度，提出了“任意时间”算法，但到目前为止只适用于全空间流聚类。然而，来自许多应用领域的数据流包含丰富的维度;集群通常只存在于特定的子空间(维度的子集)中，而不会出现在完整的特征空间中。本文提出了第一种既考虑高维又考虑流数据速度变化的算法。该算法称为SubClusTree，可以灵活地适应不同的流速度，并充分利用可用时间提供高质量的子空间聚类。实验结果证明了任意时间子空间概念的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Scientific and statistical database management : International Conference, SSDBM ... : proceedings. International Conference on Scientific and Statistical Database Management

自引率

0.00%

发文量