{"title":"A Novel Approach for Clustering High Dimensional Data Using Kernal Hubness","authors":"M. Amina, Farook K. Syed","doi":"10.1109/ICACC.2015.67","DOIUrl":null,"url":null,"abstract":"Clustering of high dimensionality data which can be seen in almost all fields these days is becoming very tedious process. The key disadvantage of high dimensional data which we can pen down is curse of dimensionality. As the magnitude of datasets grows the data points become sparse and density of area becomes less making it difficult to cluster that data which further reduces the performance of traditional algorithms used for clustering. To route these toils, hubness based algorithms were introduced. These algorithms which influences the distribution of the data points among the k-nearest neighbor. The hubness is an unguided method which finds out which points appear more frequently in the k-nearest neighbor than other points in the dataset. Mainly three algorithms are used for hub based clustering such as K-hubs, Hubness proportional clustering and Hubness proportional K-means. K-hubs algorithm is used to initialize the hubs for the clusters. Hubness Proportional Clustering (HPC) algorithm is used group the probabilistic data models. Hubness Proportional K-Means (HPKM) algorithm integrates the hubness based centroid selection and partitioning process. These algorithms are basically used for increasing the efficiency and increasing predicting accuracy of the system. The main drawback of in this method is number of iteration increasing with dimensionality is increased. To overcome this drawback a new algorithm is proposed which is based on the combination of kernel mapping and hubness phenomenon. The proposed algorithm detects arbitrary shaped clusters in the dataset and also improves the performance of clustering by minimizing the intra-cluster distance and maximizing the inter-cluster distance which improves the cluster quality.","PeriodicalId":368544,"journal":{"name":"2015 Fifth International Conference on Advances in Computing and Communications (ICACC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 Fifth International Conference on Advances in Computing and Communications (ICACC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICACC.2015.67","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Clustering of high dimensionality data which can be seen in almost all fields these days is becoming very tedious process. The key disadvantage of high dimensional data which we can pen down is curse of dimensionality. As the magnitude of datasets grows the data points become sparse and density of area becomes less making it difficult to cluster that data which further reduces the performance of traditional algorithms used for clustering. To route these toils, hubness based algorithms were introduced. These algorithms which influences the distribution of the data points among the k-nearest neighbor. The hubness is an unguided method which finds out which points appear more frequently in the k-nearest neighbor than other points in the dataset. Mainly three algorithms are used for hub based clustering such as K-hubs, Hubness proportional clustering and Hubness proportional K-means. K-hubs algorithm is used to initialize the hubs for the clusters. Hubness Proportional Clustering (HPC) algorithm is used group the probabilistic data models. Hubness Proportional K-Means (HPKM) algorithm integrates the hubness based centroid selection and partitioning process. These algorithms are basically used for increasing the efficiency and increasing predicting accuracy of the system. The main drawback of in this method is number of iteration increasing with dimensionality is increased. To overcome this drawback a new algorithm is proposed which is based on the combination of kernel mapping and hubness phenomenon. The proposed algorithm detects arbitrary shaped clusters in the dataset and also improves the performance of clustering by minimizing the intra-cluster distance and maximizing the inter-cluster distance which improves the cluster quality.