Mining Maximal Subspace Clusters to deal with Inter-Subspace Density Divergence

B. Lakshmi, K. Madhuri
{"title":"Mining Maximal Subspace Clusters to deal with Inter-Subspace Density Divergence","authors":"B. Lakshmi, K. Madhuri","doi":"10.5815/ijmsc.2019.03.04","DOIUrl":null,"url":null,"abstract":"In general, subspace clustering algorithms identify enormously large number of subspace clusters which may possibly involve redundant clusters. This paper presents Dynamic Epsilon based Maximal Subspace Clustering Algorithm (DEMSC) that handles both redundancy and inter-subspace density divergence, a phenomenon in density based subspace clustering. The proposed algorithm aims to mine maximal and non-redundant subspace clusters. A maximal subspace cluster is defined by a group of similar data objects that share maximal number of attributes. The DEMSC algorithm consists of four steps. In the first step, data points are assigned with random unique positive integers called labels. In the second step, dense units are identified based on the density notion using proposed dynamically computed epsilon-radius specific to each subspace separately and user specified input parameter minimum points , τ. In the third step, sum of the labels of each data object forming the dense unit is calculated to compute its signature and is hashed into the hash table. Finally, if a dense unit of a particular subspace collides with that of the other subspace in the hash table, then both the dense units exists with high probability in the subspace formed by combining the colliding subspaces. With this approach efficient maximal subspace clusters which are non-redundant are identified and outperforms the existing algorithms in terms of cluster quality and number of the resulted subspace clusters when experimented on different benchmark datasets.","PeriodicalId":312036,"journal":{"name":"International Journal of Mathematical Sciences and Computing","volume":"128 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Mathematical Sciences and Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5815/ijmsc.2019.03.04","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

In general, subspace clustering algorithms identify enormously large number of subspace clusters which may possibly involve redundant clusters. This paper presents Dynamic Epsilon based Maximal Subspace Clustering Algorithm (DEMSC) that handles both redundancy and inter-subspace density divergence, a phenomenon in density based subspace clustering. The proposed algorithm aims to mine maximal and non-redundant subspace clusters. A maximal subspace cluster is defined by a group of similar data objects that share maximal number of attributes. The DEMSC algorithm consists of four steps. In the first step, data points are assigned with random unique positive integers called labels. In the second step, dense units are identified based on the density notion using proposed dynamically computed epsilon-radius specific to each subspace separately and user specified input parameter minimum points , τ. In the third step, sum of the labels of each data object forming the dense unit is calculated to compute its signature and is hashed into the hash table. Finally, if a dense unit of a particular subspace collides with that of the other subspace in the hash table, then both the dense units exists with high probability in the subspace formed by combining the colliding subspaces. With this approach efficient maximal subspace clusters which are non-redundant are identified and outperforms the existing algorithms in terms of cluster quality and number of the resulted subspace clusters when experimented on different benchmark datasets.
挖掘最大子空间簇处理子空间间密度发散
一般情况下,子空间聚类算法需要识别大量的子空间聚类,这些子空间聚类可能包含冗余聚类。本文提出了一种基于动态Epsilon的极大子空间聚类算法(DEMSC),该算法可以处理基于密度的子空间聚类中的冗余和子空间间密度发散现象。该算法旨在挖掘最大和非冗余子空间聚类。最大子空间集群是由一组共享最大数量属性的相似数据对象定义的。DEMSC算法包括四个步骤。在第一步中,数据点被随机分配为唯一的正整数,称为标签。在第二步中,基于密度概念,使用提出的动态计算的特定于每个子空间的epsilon-半径和用户指定的输入参数最小点τ来识别密集单元。在第三步中,计算构成密集单元的每个数据对象的标签之和以计算其签名,并将其散列到哈希表中。最后,如果哈希表中某一特定子空间的密集单位与另一子空间的密集单位发生碰撞,则这两个密集单位都以高概率存在于由碰撞子空间组合而成的子空间中。通过在不同基准数据集上的实验,该方法有效地识别了非冗余的最大子空间聚类,并在聚类质量和聚类数量方面优于现有算法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信