{"title":"Mining Maximal Subspace Clusters to deal with Inter-Subspace Density Divergence","authors":"B. Lakshmi, K. Madhuri","doi":"10.5815/ijmsc.2019.03.04","DOIUrl":null,"url":null,"abstract":"In general, subspace clustering algorithms identify enormously large number of subspace clusters which may possibly involve redundant clusters. This paper presents Dynamic Epsilon based Maximal Subspace Clustering Algorithm (DEMSC) that handles both redundancy and inter-subspace density divergence, a phenomenon in density based subspace clustering. The proposed algorithm aims to mine maximal and non-redundant subspace clusters. A maximal subspace cluster is defined by a group of similar data objects that share maximal number of attributes. The DEMSC algorithm consists of four steps. In the first step, data points are assigned with random unique positive integers called labels. In the second step, dense units are identified based on the density notion using proposed dynamically computed epsilon-radius specific to each subspace separately and user specified input parameter minimum points , τ. In the third step, sum of the labels of each data object forming the dense unit is calculated to compute its signature and is hashed into the hash table. Finally, if a dense unit of a particular subspace collides with that of the other subspace in the hash table, then both the dense units exists with high probability in the subspace formed by combining the colliding subspaces. With this approach efficient maximal subspace clusters which are non-redundant are identified and outperforms the existing algorithms in terms of cluster quality and number of the resulted subspace clusters when experimented on different benchmark datasets.","PeriodicalId":312036,"journal":{"name":"International Journal of Mathematical Sciences and Computing","volume":"128 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Mathematical Sciences and Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5815/ijmsc.2019.03.04","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
In general, subspace clustering algorithms identify enormously large number of subspace clusters which may possibly involve redundant clusters. This paper presents Dynamic Epsilon based Maximal Subspace Clustering Algorithm (DEMSC) that handles both redundancy and inter-subspace density divergence, a phenomenon in density based subspace clustering. The proposed algorithm aims to mine maximal and non-redundant subspace clusters. A maximal subspace cluster is defined by a group of similar data objects that share maximal number of attributes. The DEMSC algorithm consists of four steps. In the first step, data points are assigned with random unique positive integers called labels. In the second step, dense units are identified based on the density notion using proposed dynamically computed epsilon-radius specific to each subspace separately and user specified input parameter minimum points , τ. In the third step, sum of the labels of each data object forming the dense unit is calculated to compute its signature and is hashed into the hash table. Finally, if a dense unit of a particular subspace collides with that of the other subspace in the hash table, then both the dense units exists with high probability in the subspace formed by combining the colliding subspaces. With this approach efficient maximal subspace clusters which are non-redundant are identified and outperforms the existing algorithms in terms of cluster quality and number of the resulted subspace clusters when experimented on different benchmark datasets.