{"title":"Scalable clustering and applications","authors":"Shahid K I, S. Chaudhury","doi":"10.1145/3009977.3010073","DOIUrl":null,"url":null,"abstract":"Large scale machine learning is becoming an active research area recently. Most of the existing clustering algorithms cannot handle big data due to its high time and space complexity. Among the clustering algorithms, eigen vector based clustering, such as Spectral clustering, shows very good accuracy, but it has cubic time complexity. There are various methods proposed to reduce the time and space complexity for eigen decomposition such as Nyström method, Lanc-zos method etc. Nyström method has linear time complexity in terms of number of data points, but has cubic time complexity in terms of number of sampling points. To reduce this, various Rank k approximation methods also proposed, but which are less efficient compare to the normalized spectral clustering. In this paper we propose a two step algorithm for spectral clustering to reduce the time complexity toO(nmk + m2k'), by combining both Nyström and Lanczos method, where k is the number of clusters and k' is the rank k approximation of the sampling matrix (k < k' << m << n). It shows very good results, with various data sets, image segmentation problems and churn prediction of a telecommunication data set, even with very low sampling (for 10 Million × 10 Million matrix, sampled only 100 columns) with lesser time, which confirms the validity of the algorithm.","PeriodicalId":93806,"journal":{"name":"Proceedings. Indian Conference on Computer Vision, Graphics & Image Processing","volume":"92 1","pages":"34:1-34:7"},"PeriodicalIF":0.0000,"publicationDate":"2016-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings. Indian Conference on Computer Vision, Graphics & Image Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3009977.3010073","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Large scale machine learning is becoming an active research area recently. Most of the existing clustering algorithms cannot handle big data due to its high time and space complexity. Among the clustering algorithms, eigen vector based clustering, such as Spectral clustering, shows very good accuracy, but it has cubic time complexity. There are various methods proposed to reduce the time and space complexity for eigen decomposition such as Nyström method, Lanc-zos method etc. Nyström method has linear time complexity in terms of number of data points, but has cubic time complexity in terms of number of sampling points. To reduce this, various Rank k approximation methods also proposed, but which are less efficient compare to the normalized spectral clustering. In this paper we propose a two step algorithm for spectral clustering to reduce the time complexity toO(nmk + m2k'), by combining both Nyström and Lanczos method, where k is the number of clusters and k' is the rank k approximation of the sampling matrix (k < k' << m << n). It shows very good results, with various data sets, image segmentation problems and churn prediction of a telecommunication data set, even with very low sampling (for 10 Million × 10 Million matrix, sampled only 100 columns) with lesser time, which confirms the validity of the algorithm.