ANALISIS PERFORMA DAN KECEPATAN KOMPUTASI ALGORITMA K-MEANS DAN K-MEDOIDS PADA TEXT CLUSTERING

Pixel :Jurnal Ilmiah Komputer Grafis Pub Date : 2022-12-07 DOI:10.51903/pixel.v15i2.931

Karno Nur Cahyo, Agus Subekti, Muhammad Haris

{"title":"ANALISIS PERFORMA DAN KECEPATAN KOMPUTASI ALGORITMA K-MEANS DAN K-MEDOIDS PADA TEXT CLUSTERING","authors":"Karno Nur Cahyo, Agus Subekti, Muhammad Haris","doi":"10.51903/pixel.v15i2.931","DOIUrl":null,"url":null,"abstract":"The large number of theses will certainly make it difficult to find categories on thesis topics that have been written by students at a university. One of the uses of the Text Mining method is being able to group thesis objects into the number of clusters formed by the clustering algorithm. This study aims to compare 2 clustering algorithms, namely the K-Means and K-Medoids algorithms to obtain an accurate evaluation of the performance and computational time in the case of thesis clustering, so that relevant topics can be grouped and have better clustering accuracy. The evaluation parameter used is the Davies Bouldin Index (DBI) which is one of the testing techniques on clustering results, with the distribution of training data and testing data using cross validation using a repetition parameter of 10 folds iteration. From the results of the study with the Term Weighting condition used is Term Occurrences and using the N-Grams value is 2, it can be concluded that the K-Means algorithm has a better DBI value of -0.426. Meanwhile, the range of DBI values owned by K-Medoids with the same conditions has a DBI value of -1,631. However, from the visualization results using t-SNE with the same supporting parameters, there are options that can be used, namely the number of clusters is 6, and the DBI value is -1.110. For testing the computational time in the clustering process of 50 thesis documents, the K-Means algorithm has an average time of 2.5 seconds while the K-Medoids algorithm has an average time of 261.5 seconds. The computer specifications used are Asus ZenBook UX425EA.312 with the processor used is 11th Gen Intel® Core™ i5-1135G7 @ 2.40GHz @ 2.40GHz, the graphics card is Intel® Iris® Xe Graphics, the RAM used is 8GB, with storage of 512GB SSD.","PeriodicalId":441181,"journal":{"name":"Pixel :Jurnal Ilmiah Komputer Grafis","volume":"2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pixel :Jurnal Ilmiah Komputer Grafis","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.51903/pixel.v15i2.931","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

The large number of theses will certainly make it difficult to find categories on thesis topics that have been written by students at a university. One of the uses of the Text Mining method is being able to group thesis objects into the number of clusters formed by the clustering algorithm. This study aims to compare 2 clustering algorithms, namely the K-Means and K-Medoids algorithms to obtain an accurate evaluation of the performance and computational time in the case of thesis clustering, so that relevant topics can be grouped and have better clustering accuracy. The evaluation parameter used is the Davies Bouldin Index (DBI) which is one of the testing techniques on clustering results, with the distribution of training data and testing data using cross validation using a repetition parameter of 10 folds iteration. From the results of the study with the Term Weighting condition used is Term Occurrences and using the N-Grams value is 2, it can be concluded that the K-Means algorithm has a better DBI value of -0.426. Meanwhile, the range of DBI values owned by K-Medoids with the same conditions has a DBI value of -1,631. However, from the visualization results using t-SNE with the same supporting parameters, there are options that can be used, namely the number of clusters is 6, and the DBI value is -1.110. For testing the computational time in the clustering process of 50 thesis documents, the K-Means algorithm has an average time of 2.5 seconds while the K-Medoids algorithm has an average time of 261.5 seconds. The computer specifications used are Asus ZenBook UX425EA.312 with the processor used is 11th Gen Intel® Core™ i5-1135G7 @ 2.40GHz @ 2.40GHz, the graphics card is Intel® Iris® Xe Graphics, the RAM used is 8GB, with storage of 512GB SSD.

查看原文本刊更多论文

大量的论文肯定会使大学学生写的论文题目很难找到分类。文本挖掘方法的一个用途是能够将论文对象分组到由聚类算法形成的簇的数量中。本研究旨在比较两种聚类算法，即K-Means和K-Medoids算法，以准确评估论文聚类情况下的性能和计算时间，从而对相关主题进行分组，获得更好的聚类精度。使用的评价参数是Davies Bouldin Index (DBI)， DBI是聚类结果的测试技术之一，训练数据和测试数据的分布采用交叉验证，使用10倍迭代的重复参数。从采用Term Weighting条件为Term Occurrences, N-Grams值为2的研究结果可以看出，K-Means算法的DBI值为-0.426，较优。同时，具有相同条件的K-Medoids拥有的DBI值范围DBI值为- 1631。但是，从使用具有相同支持参数的t-SNE的可视化结果来看，可以使用选项，即集群数量为6,DBI值为-1.110。测试50篇论文聚类过程的计算时间，K-Means算法的平均时间为2.5秒，K-Medoids算法的平均时间为261.5秒。电脑规格为华硕ZenBook UX425EA。312采用的处理器为第11代Intel®Core™i5-1135G7 @ 2.40GHz @ 2.40GHz，显卡为Intel®Iris®Xe graphics, RAM为8GB, SSD存储为512GB。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Pixel :Jurnal Ilmiah Komputer Grafis

自引率

0.00%

发文量