Tenia Wahyuningrum, S. Khomsah, S. Suyanto, Selly Meliana, Prasti Eko Yunanto, W. A. Al Maki
{"title":"利用K-Means、Mini Batch K-Means、BIRCH和谱改进聚类方法的性能","authors":"Tenia Wahyuningrum, S. Khomsah, S. Suyanto, Selly Meliana, Prasti Eko Yunanto, W. A. Al Maki","doi":"10.1109/ISRITI54043.2021.9702823","DOIUrl":null,"url":null,"abstract":"The most pressing problem of the $k$-Nearest Neighbor (KNN) classification method is voting technology, which will lead to poor accuracy of some randomly distributed complex data sets. To overcome the weakness of KNN, we added a step before the KNN classification phase. We developed a new schema for grouping data sets, making the number of clusters greater than the number of data classes. In addition, the committee selects each cluster so that it does not use voting techniques such as standard KNN methods. This study uses two sequential methods, namely the clustering method and the KNN method. Clustering methods can be used to group records into multiple clusters to select commissions from these clusters. Five clustering methods were tested: K-Means, K-Means with Principal Component Analysis (PCA), Mini Batch K-Means, Spectral and Balanced Iterative Reduction and Clustering using Hierarchies (BIRCH). All tested clustering methods are based on the cluster type of the center of gravity. According to the result, the BIRCH method has the lowest error rate among the five clustering methods (2.13), and K-Means has the largest clusters (156.63).","PeriodicalId":156265,"journal":{"name":"2021 4th International Seminar on Research of Information Technology and Intelligent Systems (ISRITI)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Improving Clustering Method Performance Using K-Means, Mini Batch K-Means, BIRCH and Spectral\",\"authors\":\"Tenia Wahyuningrum, S. Khomsah, S. Suyanto, Selly Meliana, Prasti Eko Yunanto, W. A. Al Maki\",\"doi\":\"10.1109/ISRITI54043.2021.9702823\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The most pressing problem of the $k$-Nearest Neighbor (KNN) classification method is voting technology, which will lead to poor accuracy of some randomly distributed complex data sets. To overcome the weakness of KNN, we added a step before the KNN classification phase. We developed a new schema for grouping data sets, making the number of clusters greater than the number of data classes. In addition, the committee selects each cluster so that it does not use voting techniques such as standard KNN methods. This study uses two sequential methods, namely the clustering method and the KNN method. Clustering methods can be used to group records into multiple clusters to select commissions from these clusters. Five clustering methods were tested: K-Means, K-Means with Principal Component Analysis (PCA), Mini Batch K-Means, Spectral and Balanced Iterative Reduction and Clustering using Hierarchies (BIRCH). All tested clustering methods are based on the cluster type of the center of gravity. According to the result, the BIRCH method has the lowest error rate among the five clustering methods (2.13), and K-Means has the largest clusters (156.63).\",\"PeriodicalId\":156265,\"journal\":{\"name\":\"2021 4th International Seminar on Research of Information Technology and Intelligent Systems (ISRITI)\",\"volume\":\"42 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-12-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 4th International Seminar on Research of Information Technology and Intelligent Systems (ISRITI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISRITI54043.2021.9702823\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 4th International Seminar on Research of Information Technology and Intelligent Systems (ISRITI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISRITI54043.2021.9702823","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
摘要
KNN ($k$-Nearest Neighbor)分类方法最紧迫的问题是投票技术,这将导致一些随机分布的复杂数据集的准确率较差。为了克服KNN的缺点,我们在KNN分类阶段之前增加了一个步骤。我们开发了一种新的数据集分组模式,使集群的数量大于数据类的数量。此外,委员会选择每个集群,因此不使用标准KNN方法等投票技术。本研究采用了两种顺序方法,即聚类方法和KNN方法。聚类方法可用于将记录分组到多个集群中,以便从这些集群中选择佣金。测试了5种聚类方法:K-Means、K-Means与主成分分析(PCA)、Mini Batch K-Means、Spectral and Balanced Iterative Reduction and clustering using Hierarchies (BIRCH)。所有测试的聚类方法都是基于重心的聚类类型。结果表明,在5种聚类方法中,BIRCH方法的错误率最低(2.13),K-Means方法的聚类最多(156.63)。
Improving Clustering Method Performance Using K-Means, Mini Batch K-Means, BIRCH and Spectral
The most pressing problem of the $k$-Nearest Neighbor (KNN) classification method is voting technology, which will lead to poor accuracy of some randomly distributed complex data sets. To overcome the weakness of KNN, we added a step before the KNN classification phase. We developed a new schema for grouping data sets, making the number of clusters greater than the number of data classes. In addition, the committee selects each cluster so that it does not use voting techniques such as standard KNN methods. This study uses two sequential methods, namely the clustering method and the KNN method. Clustering methods can be used to group records into multiple clusters to select commissions from these clusters. Five clustering methods were tested: K-Means, K-Means with Principal Component Analysis (PCA), Mini Batch K-Means, Spectral and Balanced Iterative Reduction and Clustering using Hierarchies (BIRCH). All tested clustering methods are based on the cluster type of the center of gravity. According to the result, the BIRCH method has the lowest error rate among the five clustering methods (2.13), and K-Means has the largest clusters (156.63).