{"title":"基于 k 数的距离测量及其对聚类的影响比较","authors":"Deny Jollyta, Prihandoko Prihandoko, Dadang Priyanto, Alyauma Hajjah, Yulvia Nora Marlim","doi":"10.30812/matrik.v23i1.3078","DOIUrl":null,"url":null,"abstract":"Heuristic data requires appropriate clustering methods to avoid casting doubt on the information generated by the grouping process. Determining an optimal cluster choice from the results of grouping is still challenging. This study aimed to analyze the four numerical measurement formulas in light of the data patterns from categorical that are now accessible to give users of heuristic data recommendations for how to derive knowledge or information from the best clusters. The method used was clustering with four measurements: Euclidean, Canberra, Manhattan, and Dynamic Time Warping and Elbow approach for optimizing. The Elbow with Sum Square Error (SSE) is employed to calculate the optimal cluster. The number of test clusters ranges from k = 2 to k = 10. Student data from social media was used in testing to help students achieve higher GPAs. 300 completed questionnaires that were circulated and used to collect the data. The result of this study showed that the Manhattan Distance is the best numerical measurement with the largest SSE of 45.359 and optimal clustering at k = 5. The optimal cluster Manhattan generated was made up of students with GPAs above 3.00 and websites/ vlogs used as learning tools by the mathematics and computer department. Each cluster’s ability to create information can be impacted by the proximity of qualities caused by variations in the number of clusters.","PeriodicalId":364657,"journal":{"name":"MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer","volume":"23 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Comparison of Distance Measurements Based on k-Numbers and Its Influence to Clustering\",\"authors\":\"Deny Jollyta, Prihandoko Prihandoko, Dadang Priyanto, Alyauma Hajjah, Yulvia Nora Marlim\",\"doi\":\"10.30812/matrik.v23i1.3078\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Heuristic data requires appropriate clustering methods to avoid casting doubt on the information generated by the grouping process. Determining an optimal cluster choice from the results of grouping is still challenging. This study aimed to analyze the four numerical measurement formulas in light of the data patterns from categorical that are now accessible to give users of heuristic data recommendations for how to derive knowledge or information from the best clusters. The method used was clustering with four measurements: Euclidean, Canberra, Manhattan, and Dynamic Time Warping and Elbow approach for optimizing. The Elbow with Sum Square Error (SSE) is employed to calculate the optimal cluster. The number of test clusters ranges from k = 2 to k = 10. Student data from social media was used in testing to help students achieve higher GPAs. 300 completed questionnaires that were circulated and used to collect the data. The result of this study showed that the Manhattan Distance is the best numerical measurement with the largest SSE of 45.359 and optimal clustering at k = 5. The optimal cluster Manhattan generated was made up of students with GPAs above 3.00 and websites/ vlogs used as learning tools by the mathematics and computer department. Each cluster’s ability to create information can be impacted by the proximity of qualities caused by variations in the number of clusters.\",\"PeriodicalId\":364657,\"journal\":{\"name\":\"MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer\",\"volume\":\"23 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-11-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.30812/matrik.v23i1.3078\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.30812/matrik.v23i1.3078","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
摘要
启发式数据需要适当的聚类方法,以避免对分组过程产生的信息产生怀疑。从分组结果中确定最佳聚类选择仍然具有挑战性。本研究旨在根据目前可获得的分类数据模式分析四种数值测量公式,为启发式数据的用户提供如何从最佳聚类中获取知识或信息的建议。使用的方法是用四种测量方法进行聚类:欧氏聚类、堪培拉聚类、曼哈顿聚类和动态时间扭曲聚类,并采用 Elbow 方法进行优化。采用带有总和平方误差(SSE)的 Elbow 方法来计算最佳聚类。测试聚类的数量从 k = 2 到 k = 10 不等。测试中使用了来自社交媒体的学生数据,以帮助学生获得更高的 GPA。300 份填写完毕的问卷被分发并用于收集数据。研究结果表明,曼哈顿距离是最好的数字测量方法,其最大 SSE 为 45.359,在 k = 5 时达到最佳聚类。曼哈顿产生的最佳聚类由 GPA 超过 3.00 的学生和数学与计算机系用作学习工具的网站/博客组成。每个聚类创建信息的能力会受到聚类数量变化导致的质量接近性的影响。
Comparison of Distance Measurements Based on k-Numbers and Its Influence to Clustering
Heuristic data requires appropriate clustering methods to avoid casting doubt on the information generated by the grouping process. Determining an optimal cluster choice from the results of grouping is still challenging. This study aimed to analyze the four numerical measurement formulas in light of the data patterns from categorical that are now accessible to give users of heuristic data recommendations for how to derive knowledge or information from the best clusters. The method used was clustering with four measurements: Euclidean, Canberra, Manhattan, and Dynamic Time Warping and Elbow approach for optimizing. The Elbow with Sum Square Error (SSE) is employed to calculate the optimal cluster. The number of test clusters ranges from k = 2 to k = 10. Student data from social media was used in testing to help students achieve higher GPAs. 300 completed questionnaires that were circulated and used to collect the data. The result of this study showed that the Manhattan Distance is the best numerical measurement with the largest SSE of 45.359 and optimal clustering at k = 5. The optimal cluster Manhattan generated was made up of students with GPAs above 3.00 and websites/ vlogs used as learning tools by the mathematics and computer department. Each cluster’s ability to create information can be impacted by the proximity of qualities caused by variations in the number of clusters.