Improved K-Means Algorithm on Home Industry Data Clustering in the Province of Bangka Belitung

Hadi Santoso, Hilyah Magdalena
{"title":"Improved K-Means Algorithm on Home Industry Data Clustering in the Province of Bangka Belitung","authors":"Hadi Santoso, Hilyah Magdalena","doi":"10.1109/ICoSTA48221.2020.1570598913","DOIUrl":null,"url":null,"abstract":"The Government of Bangka Belitung Islands Province has not classified the home industry until now. Based on these problems, we propose a k-means algorithm for clustering home industry data. The k-means algorithm is widely used because it is straightforward and very suitable for grouping data. However, in its application, the k-means algorithm has a weakness in determining the starting point of the cluster center and, in its selection, is still carried out randomly. As a result, if the random value for initializing the initial centroid value is not right, then the grouping is less than optimal. Internal cluster validation is one way to determine the optimal cluster without knowing prior information from the data. This study aims to identify the optimal group by making improvements to the k-means algorithm and then to test it by applying an internal cluster, namely the Davies-Bouldin Index (DBI) and the Silhouette Index (SI) on the data of home industry in Bangka Belitung Island Province. The optimal cluster calculation results based on internal cluster validation both show that the Silhouette index and the DBI index with k = 3 on improved k-means algorithm. While the traditional k-means algorithm of internal cluster validation both show that the Silhouette index and the Davies-Bouldin Index with k = 2. The conclusion is k = 3 on the Davies-Bouldin Index of this research data gives good results for clustering home industry data in Bangka Belitung Islands Province.","PeriodicalId":375166,"journal":{"name":"2020 International Conference on Smart Technology and Applications (ICoSTA)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 International Conference on Smart Technology and Applications (ICoSTA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICoSTA48221.2020.1570598913","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

Abstract

The Government of Bangka Belitung Islands Province has not classified the home industry until now. Based on these problems, we propose a k-means algorithm for clustering home industry data. The k-means algorithm is widely used because it is straightforward and very suitable for grouping data. However, in its application, the k-means algorithm has a weakness in determining the starting point of the cluster center and, in its selection, is still carried out randomly. As a result, if the random value for initializing the initial centroid value is not right, then the grouping is less than optimal. Internal cluster validation is one way to determine the optimal cluster without knowing prior information from the data. This study aims to identify the optimal group by making improvements to the k-means algorithm and then to test it by applying an internal cluster, namely the Davies-Bouldin Index (DBI) and the Silhouette Index (SI) on the data of home industry in Bangka Belitung Island Province. The optimal cluster calculation results based on internal cluster validation both show that the Silhouette index and the DBI index with k = 3 on improved k-means algorithm. While the traditional k-means algorithm of internal cluster validation both show that the Silhouette index and the Davies-Bouldin Index with k = 2. The conclusion is k = 3 on the Davies-Bouldin Index of this research data gives good results for clustering home industry data in Bangka Belitung Islands Province.
邦卡勿里洞省家庭产业数据聚类的改进K-Means算法
邦加勿里洞群岛省政府直到现在才对本国产业进行分类。基于这些问题,我们提出了一种k-means算法对家居行业数据进行聚类。k-means算法被广泛使用,因为它简单明了,非常适合对数据进行分组。但是在应用中,k-means算法在确定聚类中心起始点方面存在弱点,在选择上仍然是随机进行的。因此,如果初始化初始质心值的随机值不正确,则分组不是最优的。内部聚类验证是在不知道数据的先验信息的情况下确定最佳聚类的一种方法。本研究旨在通过对k-means算法进行改进,找出最优群体,然后利用内部聚类,即Davies-Bouldin指数(DBI)和Silhouette指数(SI)对邦加别里洞岛省的家居产业数据进行检验。基于内部聚类验证的最优聚类计算结果表明,改进的k-means算法得到的Silhouette指数和k = 3时的DBI指数。而传统的k-means算法的内部聚类验证都表明,当k = 2时,Silhouette指数和Davies-Bouldin指数都能有效地进行聚类验证。结论是本研究数据的Davies-Bouldin指数k = 3,对邦加勿里洞群岛省的家居产业数据进行聚类效果较好。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信