基于Davies Bouldin索引检验的文本挖掘K-Means和K-Medoids算法在学生论文分类中的比较

Digital Zone Jurnal Teknologi Informasi dan Komunikasi Pub Date : 2022-05-27 DOI:10.31849/digitalzone.v13i1.9292

Siti Ramadhani, Dini Azzahra, Tomi Z

{"title":"基于Davies Bouldin索引检验的文本挖掘K-Means和K-Medoids算法在学生论文分类中的比较","authors":"Siti Ramadhani, Dini Azzahra, Tomi Z","doi":"10.31849/digitalzone.v13i1.9292","DOIUrl":null,"url":null,"abstract":"The thesis is one of the scientific works based on the conclusions of field research or observations compiled and developed by students as well as research carried out according to the topic containing the study program which is carried out as a final project compiled in the last stage of formal study. A large number of theses, of course, will be sought in looking for categories of thesis topics, or the titles raised have different relevance. However, the student thesis can be by topics that are almost relevant to other topics so that it can make it easier to find topics that are relevant to the group. One of the uses of techniques in machine learning is to find text processing (Text Mining). In-text mining, there is a method that can be used, namely the Clustering method. Clustering processing techniques can group objects into the number of clusters formed. In addition, there are several methods used in clustering processing. This study aims to compare 2 cluster algorithms, namely the K-Means and K-Medoids algorithms to obtain an appropriate evaluation in the case of thesis grouping so that the relevant topics in the formed groups have better accuracy. The evaluation stage used is the Davies Bouldin Index (DBI) evaluation which is one of the testing techniques on the cluster. In addition, another indicator for comparison is the computation time of the two algorithms. According to the DBI value test carried out on algorithm 2, the K-Medoids algorithm is superior to K-Means, where the average DBI value produced by K-Medoids is 1,56 while K-Means is 2,79. In addition, the computational time required in classifying documents is also a reference. In testing the computational time required to group 50 documents, K-Means is superior to K-Medoids. K-Means has an average computation time for grouping documents, which is 1 second, while K-Medoids provide a computation time of 26,7778 seconds.","PeriodicalId":33266,"journal":{"name":"Digital Zone Jurnal Teknologi Informasi dan Komunikasi","volume":"58 2","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2022-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":"{\"title\":\"Comparison of K-Means and K-Medoids Algorithms in Text Mining based on Davies Bouldin Index Testing for Classification of Student’s Thesis\",\"authors\":\"Siti Ramadhani, Dini Azzahra, Tomi Z\",\"doi\":\"10.31849/digitalzone.v13i1.9292\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The thesis is one of the scientific works based on the conclusions of field research or observations compiled and developed by students as well as research carried out according to the topic containing the study program which is carried out as a final project compiled in the last stage of formal study. A large number of theses, of course, will be sought in looking for categories of thesis topics, or the titles raised have different relevance. However, the student thesis can be by topics that are almost relevant to other topics so that it can make it easier to find topics that are relevant to the group. One of the uses of techniques in machine learning is to find text processing (Text Mining). In-text mining, there is a method that can be used, namely the Clustering method. Clustering processing techniques can group objects into the number of clusters formed. In addition, there are several methods used in clustering processing. This study aims to compare 2 cluster algorithms, namely the K-Means and K-Medoids algorithms to obtain an appropriate evaluation in the case of thesis grouping so that the relevant topics in the formed groups have better accuracy. The evaluation stage used is the Davies Bouldin Index (DBI) evaluation which is one of the testing techniques on the cluster. In addition, another indicator for comparison is the computation time of the two algorithms. According to the DBI value test carried out on algorithm 2, the K-Medoids algorithm is superior to K-Means, where the average DBI value produced by K-Medoids is 1,56 while K-Means is 2,79. In addition, the computational time required in classifying documents is also a reference. In testing the computational time required to group 50 documents, K-Means is superior to K-Medoids. K-Means has an average computation time for grouping documents, which is 1 second, while K-Medoids provide a computation time of 26,7778 seconds.\",\"PeriodicalId\":33266,\"journal\":{\"name\":\"Digital Zone Jurnal Teknologi Informasi dan Komunikasi\",\"volume\":\"58 2\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-05-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"8\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Digital Zone Jurnal Teknologi Informasi dan Komunikasi\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.31849/digitalzone.v13i1.9292\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Digital Zone Jurnal Teknologi Informasi dan Komunikasi","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.31849/digitalzone.v13i1.9292","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 8

摘要

论文是根据学生实地调查或观察的结论，以及在正式学习的最后阶段根据包含学习计划的主题进行的研究，作为期末项目进行的研究工作之一。当然，大量的论文，在寻找论文题目的类别时，或者提出的题目有不同的相关性。然而，学生的论文可以按主题，几乎是相关的其他主题，这样可以更容易地找到相关的主题组。机器学习技术的用途之一是查找文本处理(文本挖掘)。在文本挖掘中，有一种可以使用的方法，即聚类方法。聚类处理技术可以将对象按所形成的簇的数量进行分组。此外，在聚类处理中还使用了几种方法。本研究旨在比较两种聚类算法，即K-Means算法和K-Medoids算法，在论文分组的情况下获得适当的评价，使所形成的分组中的相关主题具有更好的准确性。所采用的评价阶段是Davies Bouldin指数(DBI)评价，DBI是集群的一种测试技术。另外，比较的另一个指标是两种算法的计算时间。通过对算法2的DBI值检验，K-Medoids算法优于K-Means算法，K-Medoids算法产生的平均DBI值为1.56,K-Means算法产生的DBI值为2.79。此外，文档分类所需的计算时间也是一个参考。在测试分组50个文档所需的计算时间时，K-Means优于K-Medoids。K-Means对文档进行分组的平均计算时间为1秒，而K-Medoids提供的计算时间为267778秒。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Comparison of K-Means and K-Medoids Algorithms in Text Mining based on Davies Bouldin Index Testing for Classification of Student’s Thesis

The thesis is one of the scientific works based on the conclusions of field research or observations compiled and developed by students as well as research carried out according to the topic containing the study program which is carried out as a final project compiled in the last stage of formal study. A large number of theses, of course, will be sought in looking for categories of thesis topics, or the titles raised have different relevance. However, the student thesis can be by topics that are almost relevant to other topics so that it can make it easier to find topics that are relevant to the group. One of the uses of techniques in machine learning is to find text processing (Text Mining). In-text mining, there is a method that can be used, namely the Clustering method. Clustering processing techniques can group objects into the number of clusters formed. In addition, there are several methods used in clustering processing. This study aims to compare 2 cluster algorithms, namely the K-Means and K-Medoids algorithms to obtain an appropriate evaluation in the case of thesis grouping so that the relevant topics in the formed groups have better accuracy. The evaluation stage used is the Davies Bouldin Index (DBI) evaluation which is one of the testing techniques on the cluster. In addition, another indicator for comparison is the computation time of the two algorithms. According to the DBI value test carried out on algorithm 2, the K-Medoids algorithm is superior to K-Means, where the average DBI value produced by K-Medoids is 1,56 while K-Means is 2,79. In addition, the computational time required in classifying documents is also a reference. In testing the computational time required to group 50 documents, K-Means is superior to K-Medoids. K-Means has an average computation time for grouping documents, which is 1 second, while K-Medoids provide a computation time of 26,7778 seconds.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Digital Zone Jurnal Teknologi Informasi dan Komunikasi

自引率

0.00%

发文量

审稿时长

14 weeks