Two Step Clustering Model for K-Means Algorithm

International Conference on Network, Communication and Computing Pub Date : 2016-12-17 DOI:10.1145/3033288.3033347

Narongsak Chayangkoon, A. Srivihok

{"title":"Two Step Clustering Model for K-Means Algorithm","authors":"Narongsak Chayangkoon, A. Srivihok","doi":"10.1145/3033288.3033347","DOIUrl":null,"url":null,"abstract":"In this paper, we propose Two Step Clustering Model for finding the number of clusters for K-Means Algorithm. The Hybrid Model solves the weakness of K-Means Algorithm especially for the general users who will try to find the number of the clusters for cluster analysis by K-Means Algorithm. In this research, we solve the problem by proposing a Hybrid Model. In the experiment, we used 10 datasets from UCI machine learning repository. In addition, for feature selection we used three algorithms. The first used Best First Search and Correction --Based Feature Subset Selection. The second used Ranker and Principal Component Analysis. The third used Best First Search and Wrapper Subset Evaluator, classification used Naïve Bayes Classifier. The determinants of the baseline model didn't use the searching method and feature selection. Moreover, we compared the performance of three algorithms for finding the k value. For the first we used EM, for the second, we used Cascade K-Means, and for the third, we used Canopy. We also evaluated performance testing of the Hybrid Model and we compared the criterion clustering by using the Sum of Squared Errors. Thus, our Hybrid Model includes searching used Ranker, the evaluator uses Principal Component Analysis. For clustering, Expectation Maximization for finding the number of clusters and the cluster analysis uses simple K-mean. Furthermore, our experimental results showed the best Hybrid Model approach achieves a higher performance that the algorithms available in our Hybrid Model had the lowest Sum of Squared Errors.","PeriodicalId":253625,"journal":{"name":"International Conference on Network, Communication and Computing","volume":"11 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Conference on Network, Communication and Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3033288.3033347","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 5

Abstract

In this paper, we propose Two Step Clustering Model for finding the number of clusters for K-Means Algorithm. The Hybrid Model solves the weakness of K-Means Algorithm especially for the general users who will try to find the number of the clusters for cluster analysis by K-Means Algorithm. In this research, we solve the problem by proposing a Hybrid Model. In the experiment, we used 10 datasets from UCI machine learning repository. In addition, for feature selection we used three algorithms. The first used Best First Search and Correction --Based Feature Subset Selection. The second used Ranker and Principal Component Analysis. The third used Best First Search and Wrapper Subset Evaluator, classification used Naïve Bayes Classifier. The determinants of the baseline model didn't use the searching method and feature selection. Moreover, we compared the performance of three algorithms for finding the k value. For the first we used EM, for the second, we used Cascade K-Means, and for the third, we used Canopy. We also evaluated performance testing of the Hybrid Model and we compared the criterion clustering by using the Sum of Squared Errors. Thus, our Hybrid Model includes searching used Ranker, the evaluator uses Principal Component Analysis. For clustering, Expectation Maximization for finding the number of clusters and the cluster analysis uses simple K-mean. Furthermore, our experimental results showed the best Hybrid Model approach achieves a higher performance that the algorithms available in our Hybrid Model had the lowest Sum of Squared Errors.

查看原文本刊更多论文

K-Means算法的两步聚类模型

本文提出了K-Means算法中寻找聚类数量的两步聚类模型。混合模型解决了K-Means算法的缺点，特别是对于一般用户来说，需要通过K-Means算法来寻找聚类的数量进行聚类分析。在本研究中，我们提出了一个混合模型来解决这个问题。在实验中，我们使用了来自UCI机器学习存储库的10个数据集。此外，对于特征选择，我们使用了三种算法。第一种是基于最佳优先搜索和校正的特征子集选择。第二种方法采用秩和主成分分析。第三种使用最佳第一搜索和包装子集评估器，分类使用Naïve贝叶斯分类器。基线模型的决定因素没有使用搜索方法和特征选择。此外，我们比较了三种算法在寻找k值方面的性能。第一个我们用了EM，第二个我们用了Cascade K-Means，第三个我们用了Canopy。我们还评估了混合模型的性能测试，并使用误差平方和对标准聚类进行了比较。因此，我们的混合模型包括搜索使用rank，评估者使用主成分分析。对于聚类，期望最大化用于查找聚类的数量，聚类分析使用简单的k -均值。此外，我们的实验结果表明，最好的混合模型方法取得了更高的性能，在我们的混合模型中可用的算法具有最低的平方和误差。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

International Conference on Network, Communication and Computing

自引率

0.00%

发文量