The Application of Numerical Measure Variations in K-Means Clustering for Grouping Data

MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer Pub Date : 2023-11-20 DOI:10.30812/matrik.v23i1.3269

Relita Buaton, Solikhun Solikhun

{"title":"The Application of Numerical Measure Variations in K-Means Clustering for Grouping Data","authors":"Relita Buaton, Solikhun Solikhun","doi":"10.30812/matrik.v23i1.3269","DOIUrl":null,"url":null,"abstract":"The K-Means Clustering algorithm is commonly used by researchers in grouping data. The main problem in this study was that it has yet to be discovered how optimal the grouping with variations in distance calculations is in K-Means Clustering. The purpose of this research was to compare distance calculation methods with K-Means such as Euclidean Distance, Canberra Distance, Chebychev Distance, Cosine Similarity, Dynamic TimeWarping Distance, Jaccard Similarity, and Manhattan Distance to find out how optimal the distance calculation is in the K-Means method. The best distancecalculation was determined from the smallest Davies Bouldin Index value. This research aimed to find optimal clusters using the K-Means Clustering algorithm with seven distance calculations based on types of numerical measures. This research method compared distance calculation methods in the K-Means algorithm, such as Euclidean Distance, Canberra Distance, Chebychev Distance, Cosine Smilirity, Dynamic Time Warping Distance, Jaccard Smilirity and Manhattan Distance to find out how optimal the distance calculation is in the K-Means method. Determining the best distance calculation can be seen from the smallest Davies Bouldin Index value. The data used in this study was on cosmetic sales at Devi Cosmetics, consisting of cosmetics sales from January to April 2022 with 56 product items. The result of this study was a comparison of numerical measures in the K-Means Clustering algorithm. The optimal cluster was calculating the Euclidean distance with a total of 9 clusters with a DBI value of 0.224. In comparison, the best average DBI value was the calculation of the Euclidean Distance with an average DBI value of 0.265.","PeriodicalId":364657,"journal":{"name":"MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer","volume":"43 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.30812/matrik.v23i1.3269","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

The K-Means Clustering algorithm is commonly used by researchers in grouping data. The main problem in this study was that it has yet to be discovered how optimal the grouping with variations in distance calculations is in K-Means Clustering. The purpose of this research was to compare distance calculation methods with K-Means such as Euclidean Distance, Canberra Distance, Chebychev Distance, Cosine Similarity, Dynamic TimeWarping Distance, Jaccard Similarity, and Manhattan Distance to find out how optimal the distance calculation is in the K-Means method. The best distancecalculation was determined from the smallest Davies Bouldin Index value. This research aimed to find optimal clusters using the K-Means Clustering algorithm with seven distance calculations based on types of numerical measures. This research method compared distance calculation methods in the K-Means algorithm, such as Euclidean Distance, Canberra Distance, Chebychev Distance, Cosine Smilirity, Dynamic Time Warping Distance, Jaccard Smilirity and Manhattan Distance to find out how optimal the distance calculation is in the K-Means method. Determining the best distance calculation can be seen from the smallest Davies Bouldin Index value. The data used in this study was on cosmetic sales at Devi Cosmetics, consisting of cosmetics sales from January to April 2022 with 56 product items. The result of this study was a comparison of numerical measures in the K-Means Clustering algorithm. The optimal cluster was calculating the Euclidean distance with a total of 9 clusters with a DBI value of 0.224. In comparison, the best average DBI value was the calculation of the Euclidean Distance with an average DBI value of 0.265.

查看原文本刊更多论文

在 K-Means 聚类中应用数值度量变量对数据进行分组

K-Means 聚类算法通常被研究人员用于数据分组。本研究的主要问题在于，K-Means 聚类算法中，距离计算方法的变化对分组的优化效果如何，这一点尚未被发现。本研究的目的是对欧氏距离、堪培拉距离、切比切夫距离、余弦相似度、动态时差距离、杰卡德相似度和曼哈顿距离等 K-Means 的距离计算方法进行比较，以找出 K-Means 方法中的最佳距离计算方法。最佳距离计算是根据最小的戴维斯-博尔丁指数值确定的。这项研究旨在利用 K-Means 聚类算法，通过七种基于数字度量类型的距离计算方法，找到最佳聚类。该研究方法比较了 K-Means 算法中的距离计算方法，如欧氏距离、堪培拉距离、切比切夫距离、余弦 Smilirity、动态时间扭曲距离、Jaccard Smilirity 和曼哈顿距离，以找出 K-Means 方法中的最佳距离计算方法。最佳距离计算可以从最小的戴维斯-博尔丁指数值看出。本研究使用的数据是黛维化妆品公司的化妆品销售数据，包括 2022 年 1 月至 4 月的化妆品销售，共有 56 个产品项目。这项研究的结果是对 K-Means 聚类算法中的数值指标进行比较。最佳聚类是计算欧氏距离，共有 9 个聚类，DBI 值为 0.224。相比之下，最佳平均 DBI 值是计算欧氏距离，平均 DBI 值为 0.265。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer

自引率

0.00%

发文量