A Parallel Framework for Grid-Based Bottom-Up Subspace Clustering

Poonam Goyal, S. Kumari, Shubham Singh, V. Kishore, S. Balasubramaniam, Navneet Goyal
{"title":"A Parallel Framework for Grid-Based Bottom-Up Subspace Clustering","authors":"Poonam Goyal, S. Kumari, Shubham Singh, V. Kishore, S. Balasubramaniam, Navneet Goyal","doi":"10.1109/DSAA.2016.42","DOIUrl":null,"url":null,"abstract":"Clustering is a popular data mining and machine learning technique which discovers interesting patterns from unlabeled data by grouping similar objects together. Clustering high-dimensional data is a challenging task as points in high dimensional space are nearly equidistant from each other, rendering commonly used similarity measures ineffective. Subspace clustering has emerged as a possible solution to the problem of clustering high-dimensional data. In subspace clustering, we try to find clusters in different subspaces within a dataset. Many subspace clustering algorithms have been proposed in the last two decades to find clusters in multiple overlapping subspaces of high-dimensional data. Subspace clustering algorithms iteratively find the best subset of dimensions for a cluster from 2d–1 possible combinations in d-dimensional data. Subspace clustering is extremely compute intensive because of exhaustive search of subspaces, especially in the bottom-up subspace clustering algorithms. To address this issue, an efficient parallel framework for grid-based bottom-up subspace clustering algorithms is developed, considering popular algorithms belonging to this category. The framework is implemented for shared memory, distributed memory, and hybrid systems and is tested for three grid-based bottom-up subspace clustering algorithms: CLIQUE, MAFIA, and ENCLUS. All parallel implementations exhibit impressive speedup and scalability on real datasets.","PeriodicalId":193885,"journal":{"name":"2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DSAA.2016.42","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5

Abstract

Clustering is a popular data mining and machine learning technique which discovers interesting patterns from unlabeled data by grouping similar objects together. Clustering high-dimensional data is a challenging task as points in high dimensional space are nearly equidistant from each other, rendering commonly used similarity measures ineffective. Subspace clustering has emerged as a possible solution to the problem of clustering high-dimensional data. In subspace clustering, we try to find clusters in different subspaces within a dataset. Many subspace clustering algorithms have been proposed in the last two decades to find clusters in multiple overlapping subspaces of high-dimensional data. Subspace clustering algorithms iteratively find the best subset of dimensions for a cluster from 2d–1 possible combinations in d-dimensional data. Subspace clustering is extremely compute intensive because of exhaustive search of subspaces, especially in the bottom-up subspace clustering algorithms. To address this issue, an efficient parallel framework for grid-based bottom-up subspace clustering algorithms is developed, considering popular algorithms belonging to this category. The framework is implemented for shared memory, distributed memory, and hybrid systems and is tested for three grid-based bottom-up subspace clustering algorithms: CLIQUE, MAFIA, and ENCLUS. All parallel implementations exhibit impressive speedup and scalability on real datasets.
基于网格的自底向上子空间聚类并行框架
聚类是一种流行的数据挖掘和机器学习技术,它通过将相似的对象分组在一起,从未标记的数据中发现有趣的模式。聚类高维数据是一项具有挑战性的任务,因为高维空间中的点彼此之间的距离几乎相等,使得常用的相似性度量无效。子空间聚类是解决高维数据聚类问题的一种可能的方法。在子空间聚类中,我们试图在数据集中的不同子空间中找到聚类。在过去的二十年里,人们提出了许多子空间聚类算法来在高维数据的多个重叠子空间中寻找聚类。子空间聚类算法从d维数据的2d-1可能组合中迭代地找到聚类的最佳维度子集。由于子空间的穷举搜索,特别是自底向上的子空间聚类算法,子空间聚类的计算量非常大。为了解决这一问题,考虑到这类常用算法,开发了一种基于网格的自下而上子空间聚类算法的高效并行框架。该框架适用于共享内存、分布式内存和混合系统,并测试了三种基于网格的自下而上子空间聚类算法:CLIQUE、MAFIA和ENCLUS。所有并行实现在实际数据集上都表现出令人印象深刻的加速和可伸缩性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信