SUBSCALE:高维数据的快速可伸缩子空间聚类

2014 IEEE International Conference on Data Mining Workshop Pub Date : 2014-12-01 DOI:10.1109/ICDMW.2014.100

Amardeep Kaur, A. Datta

{"title":"SUBSCALE:高维数据的快速可伸缩子空间聚类","authors":"Amardeep Kaur, A. Datta","doi":"10.1109/ICDMW.2014.100","DOIUrl":null,"url":null,"abstract":"The aim of subspace clustering is to find groups of similar data points in all possible subspaces of a dataset. Since the number of subspaces is exponential in dimensions, subspace clustering is usually computationally very expensive. The performance of existing algorithms deteriorates drastically with the increase in number of dimensions. Most of them use bottom-up search strategy and there are two main reasons for their inefficiency: (1) Multiple database scans. (2) Either implicit or explicit generation of trivial subspace clusters during the process. We present SUBSCALE, a novel algorithm to directly find the non-trivial subspace clusters with minimal cost and it requires only k database scans for a k-dimensional data set. Our algorithm scales very well with the dimensionality and is highly parallelizable. The experimental evaluation has shown promising results.","PeriodicalId":289269,"journal":{"name":"2014 IEEE International Conference on Data Mining Workshop","volume":"02 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"SUBSCALE: Fast and Scalable Subspace Clustering for High Dimensional Data\",\"authors\":\"Amardeep Kaur, A. Datta\",\"doi\":\"10.1109/ICDMW.2014.100\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The aim of subspace clustering is to find groups of similar data points in all possible subspaces of a dataset. Since the number of subspaces is exponential in dimensions, subspace clustering is usually computationally very expensive. The performance of existing algorithms deteriorates drastically with the increase in number of dimensions. Most of them use bottom-up search strategy and there are two main reasons for their inefficiency: (1) Multiple database scans. (2) Either implicit or explicit generation of trivial subspace clusters during the process. We present SUBSCALE, a novel algorithm to directly find the non-trivial subspace clusters with minimal cost and it requires only k database scans for a k-dimensional data set. Our algorithm scales very well with the dimensionality and is highly parallelizable. The experimental evaluation has shown promising results.\",\"PeriodicalId\":289269,\"journal\":{\"name\":\"2014 IEEE International Conference on Data Mining Workshop\",\"volume\":\"02 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2014 IEEE International Conference on Data Mining Workshop\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICDMW.2014.100\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 IEEE International Conference on Data Mining Workshop","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDMW.2014.100","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 7

摘要

子空间聚类的目的是在数据集的所有可能的子空间中找到相似数据点的组。由于子空间的数量在维度上是指数的，子空间聚类通常在计算上非常昂贵。现有算法的性能随着维数的增加而急剧下降。它们大多使用自底向上的搜索策略，其低效率主要有两个原因:(1)多次数据库扫描。(2)在此过程中隐式或显式生成平凡子空间簇。我们提出了一种新的算法SUBSCALE，它以最小的代价直接找到非平凡子空间簇，并且只需要对k维数据集进行k次数据库扫描。我们的算法可以很好地扩展维度，并且具有高度的并行性。实验评价结果令人满意。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

SUBSCALE: Fast and Scalable Subspace Clustering for High Dimensional Data

The aim of subspace clustering is to find groups of similar data points in all possible subspaces of a dataset. Since the number of subspaces is exponential in dimensions, subspace clustering is usually computationally very expensive. The performance of existing algorithms deteriorates drastically with the increase in number of dimensions. Most of them use bottom-up search strategy and there are two main reasons for their inefficiency: (1) Multiple database scans. (2) Either implicit or explicit generation of trivial subspace clusters during the process. We present SUBSCALE, a novel algorithm to directly find the non-trivial subspace clusters with minimal cost and it requires only k database scans for a k-dimensional data set. Our algorithm scales very well with the dimensionality and is highly parallelizable. The experimental evaluation has shown promising results.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2014 IEEE International Conference on Data Mining Workshop

自引率

0.00%

发文量