Comparison of distance metric in k-mean algorithm for clustering wheat grain datasheet

Jurnal Teknik Informatika C.I.T Medicom Pub Date : 2023-05-31 DOI:10.35335/cit.vol15.2023.408.pp73-83

S. Suraya, Muhammad Sholeh, D. Andayati

{"title":"Comparison of distance metric in k-mean algorithm for clustering wheat grain datasheet","authors":"S. Suraya, Muhammad Sholeh, D. Andayati","doi":"10.35335/cit.vol15.2023.408.pp73-83","DOIUrl":null,"url":null,"abstract":"One of the data mining models is clustering, clustering models can be used to create groupings of data. Clustering is done by creating groups of data that are close to each other. The research was conducted by clustering wheat seed datasheets.Â The wheat grain datasheet contains various types of wheat data.Â The purpose of this research is to create a clustering model. The algorithm used is the K-means algorithm and a comparison is made with several distance Metric algorithms. The datasheet used was tested with the K-means algorithm and tested the clustering value (k) ranging from k = 2 to k = 6. Comparison of clustering results with K-means is also done by comparing with distance metric algorithms, namely Euclidean distance, Manhattan distance, and Chebychev distance.Â All testing processes are evaluated, and the evaluation is done to select many good groupings. The evaluation process is carried out using the Davis-Bouldin method. The results of the grouping that has been done, each seen Davis Bouldin evaluation. The evaluation value of Davis Bouldin is sought from the smallest value and if the evaluation result is negative, the value is solved. The research method used is Knowledge Discovery in Database (KDD). The results showed that the same datasheet and using the K-means algorithm and the same evaluation resulted in different evaluation values. The Euclidian, Manhattan, and Chebychev algorithms produce the best k value of 2, The conclusion of the wheat seed datasheet clustering research produces a value of k = 2","PeriodicalId":154242,"journal":{"name":"Jurnal Teknik Informatika C.I.T Medicom","volume":"15 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Jurnal Teknik Informatika C.I.T Medicom","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.35335/cit.vol15.2023.408.pp73-83","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

One of the data mining models is clustering, clustering models can be used to create groupings of data. Clustering is done by creating groups of data that are close to each other. The research was conducted by clustering wheat seed datasheets.Â The wheat grain datasheet contains various types of wheat data.Â The purpose of this research is to create a clustering model. The algorithm used is the K-means algorithm and a comparison is made with several distance Metric algorithms. The datasheet used was tested with the K-means algorithm and tested the clustering value (k) ranging from k = 2 to k = 6. Comparison of clustering results with K-means is also done by comparing with distance metric algorithms, namely Euclidean distance, Manhattan distance, and Chebychev distance.Â All testing processes are evaluated, and the evaluation is done to select many good groupings. The evaluation process is carried out using the Davis-Bouldin method. The results of the grouping that has been done, each seen Davis Bouldin evaluation. The evaluation value of Davis Bouldin is sought from the smallest value and if the evaluation result is negative, the value is solved. The research method used is Knowledge Discovery in Database (KDD). The results showed that the same datasheet and using the K-means algorithm and the same evaluation resulted in different evaluation values. The Euclidian, Manhattan, and Chebychev algorithms produce the best k value of 2, The conclusion of the wheat seed datasheet clustering research produces a value of k = 2

查看原文本刊更多论文

k-均值聚类算法中距离度量的比较

聚类是数据挖掘模型之一，聚类模型可以用来创建数据分组。集群是通过创建彼此接近的数据组来完成的。研究是通过聚类小麦种子数据表进行的。Â小麦谷物数据表包含各种类型的小麦数据。Â本研究的目的是创建一个聚类模型。所使用的算法是K-means算法，并与几种距离度量算法进行了比较。使用k -means算法对数据表进行检验，并对k = 2 ~ k = 6的聚类值(k)进行检验。通过与距离度量算法，即欧氏距离、曼哈顿距离和切比切夫距离的比较，对K-means聚类结果进行比较。Â所有的测试过程都进行了评估，评估是为了选择许多好的分组。评估过程采用Davis-Bouldin方法进行。已经完成的分组结果，每个都看到了Davis Bouldin的评价。从最小值中求Davis Bouldin的评价值，如果评价结果为负，则求解该值。本文采用的研究方法是数据库中的知识发现(KDD)。结果表明，相同的数据表，使用K-means算法，相同的评价结果会产生不同的评价值。Euclidian、Manhattan和Chebychev算法产生的k值最好为2，小麦种子数据集聚类研究的结论产生的k值为2

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Jurnal Teknik Informatika C.I.T Medicom

自引率

0.00%

发文量