利用K-Means、Mini Batch K-Means、BIRCH和谱改进聚类方法的性能

Tenia Wahyuningrum, S. Khomsah, S. Suyanto, Selly Meliana, Prasti Eko Yunanto, W. A. Al Maki
{"title":"利用K-Means、Mini Batch K-Means、BIRCH和谱改进聚类方法的性能","authors":"Tenia Wahyuningrum, S. Khomsah, S. Suyanto, Selly Meliana, Prasti Eko Yunanto, W. A. Al Maki","doi":"10.1109/ISRITI54043.2021.9702823","DOIUrl":null,"url":null,"abstract":"The most pressing problem of the $k$-Nearest Neighbor (KNN) classification method is voting technology, which will lead to poor accuracy of some randomly distributed complex data sets. To overcome the weakness of KNN, we added a step before the KNN classification phase. We developed a new schema for grouping data sets, making the number of clusters greater than the number of data classes. In addition, the committee selects each cluster so that it does not use voting techniques such as standard KNN methods. This study uses two sequential methods, namely the clustering method and the KNN method. Clustering methods can be used to group records into multiple clusters to select commissions from these clusters. Five clustering methods were tested: K-Means, K-Means with Principal Component Analysis (PCA), Mini Batch K-Means, Spectral and Balanced Iterative Reduction and Clustering using Hierarchies (BIRCH). All tested clustering methods are based on the cluster type of the center of gravity. According to the result, the BIRCH method has the lowest error rate among the five clustering methods (2.13), and K-Means has the largest clusters (156.63).","PeriodicalId":156265,"journal":{"name":"2021 4th International Seminar on Research of Information Technology and Intelligent Systems (ISRITI)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Improving Clustering Method Performance Using K-Means, Mini Batch K-Means, BIRCH and Spectral\",\"authors\":\"Tenia Wahyuningrum, S. Khomsah, S. Suyanto, Selly Meliana, Prasti Eko Yunanto, W. A. Al Maki\",\"doi\":\"10.1109/ISRITI54043.2021.9702823\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The most pressing problem of the $k$-Nearest Neighbor (KNN) classification method is voting technology, which will lead to poor accuracy of some randomly distributed complex data sets. To overcome the weakness of KNN, we added a step before the KNN classification phase. We developed a new schema for grouping data sets, making the number of clusters greater than the number of data classes. In addition, the committee selects each cluster so that it does not use voting techniques such as standard KNN methods. This study uses two sequential methods, namely the clustering method and the KNN method. Clustering methods can be used to group records into multiple clusters to select commissions from these clusters. Five clustering methods were tested: K-Means, K-Means with Principal Component Analysis (PCA), Mini Batch K-Means, Spectral and Balanced Iterative Reduction and Clustering using Hierarchies (BIRCH). All tested clustering methods are based on the cluster type of the center of gravity. According to the result, the BIRCH method has the lowest error rate among the five clustering methods (2.13), and K-Means has the largest clusters (156.63).\",\"PeriodicalId\":156265,\"journal\":{\"name\":\"2021 4th International Seminar on Research of Information Technology and Intelligent Systems (ISRITI)\",\"volume\":\"42 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-12-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 4th International Seminar on Research of Information Technology and Intelligent Systems (ISRITI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISRITI54043.2021.9702823\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 4th International Seminar on Research of Information Technology and Intelligent Systems (ISRITI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISRITI54043.2021.9702823","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

KNN ($k$-Nearest Neighbor)分类方法最紧迫的问题是投票技术,这将导致一些随机分布的复杂数据集的准确率较差。为了克服KNN的缺点,我们在KNN分类阶段之前增加了一个步骤。我们开发了一种新的数据集分组模式,使集群的数量大于数据类的数量。此外,委员会选择每个集群,因此不使用标准KNN方法等投票技术。本研究采用了两种顺序方法,即聚类方法和KNN方法。聚类方法可用于将记录分组到多个集群中,以便从这些集群中选择佣金。测试了5种聚类方法:K-Means、K-Means与主成分分析(PCA)、Mini Batch K-Means、Spectral and Balanced Iterative Reduction and clustering using Hierarchies (BIRCH)。所有测试的聚类方法都是基于重心的聚类类型。结果表明,在5种聚类方法中,BIRCH方法的错误率最低(2.13),K-Means方法的聚类最多(156.63)。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Improving Clustering Method Performance Using K-Means, Mini Batch K-Means, BIRCH and Spectral
The most pressing problem of the $k$-Nearest Neighbor (KNN) classification method is voting technology, which will lead to poor accuracy of some randomly distributed complex data sets. To overcome the weakness of KNN, we added a step before the KNN classification phase. We developed a new schema for grouping data sets, making the number of clusters greater than the number of data classes. In addition, the committee selects each cluster so that it does not use voting techniques such as standard KNN methods. This study uses two sequential methods, namely the clustering method and the KNN method. Clustering methods can be used to group records into multiple clusters to select commissions from these clusters. Five clustering methods were tested: K-Means, K-Means with Principal Component Analysis (PCA), Mini Batch K-Means, Spectral and Balanced Iterative Reduction and Clustering using Hierarchies (BIRCH). All tested clustering methods are based on the cluster type of the center of gravity. According to the result, the BIRCH method has the lowest error rate among the five clustering methods (2.13), and K-Means has the largest clusters (156.63).
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信