{"title":"Two Step Clustering Model for K-Means Algorithm","authors":"Narongsak Chayangkoon, A. Srivihok","doi":"10.1145/3033288.3033347","DOIUrl":null,"url":null,"abstract":"In this paper, we propose Two Step Clustering Model for finding the number of clusters for K-Means Algorithm. The Hybrid Model solves the weakness of K-Means Algorithm especially for the general users who will try to find the number of the clusters for cluster analysis by K-Means Algorithm. In this research, we solve the problem by proposing a Hybrid Model. In the experiment, we used 10 datasets from UCI machine learning repository. In addition, for feature selection we used three algorithms. The first used Best First Search and Correction --Based Feature Subset Selection. The second used Ranker and Principal Component Analysis. The third used Best First Search and Wrapper Subset Evaluator, classification used Naïve Bayes Classifier. The determinants of the baseline model didn't use the searching method and feature selection. Moreover, we compared the performance of three algorithms for finding the k value. For the first we used EM, for the second, we used Cascade K-Means, and for the third, we used Canopy. We also evaluated performance testing of the Hybrid Model and we compared the criterion clustering by using the Sum of Squared Errors. Thus, our Hybrid Model includes searching used Ranker, the evaluator uses Principal Component Analysis. For clustering, Expectation Maximization for finding the number of clusters and the cluster analysis uses simple K-mean. Furthermore, our experimental results showed the best Hybrid Model approach achieves a higher performance that the algorithms available in our Hybrid Model had the lowest Sum of Squared Errors.","PeriodicalId":253625,"journal":{"name":"International Conference on Network, Communication and Computing","volume":"11 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Conference on Network, Communication and Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3033288.3033347","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5
Abstract
In this paper, we propose Two Step Clustering Model for finding the number of clusters for K-Means Algorithm. The Hybrid Model solves the weakness of K-Means Algorithm especially for the general users who will try to find the number of the clusters for cluster analysis by K-Means Algorithm. In this research, we solve the problem by proposing a Hybrid Model. In the experiment, we used 10 datasets from UCI machine learning repository. In addition, for feature selection we used three algorithms. The first used Best First Search and Correction --Based Feature Subset Selection. The second used Ranker and Principal Component Analysis. The third used Best First Search and Wrapper Subset Evaluator, classification used Naïve Bayes Classifier. The determinants of the baseline model didn't use the searching method and feature selection. Moreover, we compared the performance of three algorithms for finding the k value. For the first we used EM, for the second, we used Cascade K-Means, and for the third, we used Canopy. We also evaluated performance testing of the Hybrid Model and we compared the criterion clustering by using the Sum of Squared Errors. Thus, our Hybrid Model includes searching used Ranker, the evaluator uses Principal Component Analysis. For clustering, Expectation Maximization for finding the number of clusters and the cluster analysis uses simple K-mean. Furthermore, our experimental results showed the best Hybrid Model approach achieves a higher performance that the algorithms available in our Hybrid Model had the lowest Sum of Squared Errors.