GBKM: A New Genetic Based K-Means Clustering Algorithm

2021 7th International Conference on Web Research (ICWR) Pub Date : 2021-05-19 DOI:10.1109/ICWR51868.2021.9443113

Mahnaz Mardi, M. Keyvanpour

{"title":"GBKM: A New Genetic Based K-Means Clustering Algorithm","authors":"Mahnaz Mardi, M. Keyvanpour","doi":"10.1109/ICWR51868.2021.9443113","DOIUrl":null,"url":null,"abstract":"Clustering is an unsupervised classification method that focused on grouping data into clusters. The objects in each cluster are very similar but different from the objects in the other clusters. As clustering methods deal with the massive amount of information, many intelligent software agents have been widely utilized clustering techniques to filter, retrieve, and categorize documents that exist on the World Wide Web. Web mining is generally classified under data mining. In data mining, one of the significant clustering centroid-based partitioning methods is the K-Means algorithm. One of the K-Means algorithm's challenges is its extreme sensitivity to initial cluster centers' choice, which may yield get stuck in the local optimum if the initial centers are selected randomly. A variant of the K-Means method is the K-Means++ algorithm, which improves the algorithm's performance by smart choices of initialization of the cluster centroids. Evolutionary techniques, widely utilized for optimizing clustering methods by providing their prerequisite parameters. The Genetic Algorithm is stochastic and population-based, that applied in optimization problem-solving. This paper proposed a Genetic-based K-Means (GBKM) clustering algorithm where the clusters' centroids are encoded by chromosomes rather than random initial cluster centroids. The best cluster centers gave by the Genetic algorithm that maximizes the fitness function, as initial points of the K-Means algorithm. The results show this model helps increase the K-Means algorithm's performance by appropriate choice of initialization of the cluster centroids, compared to four other clustering algorithms.","PeriodicalId":377597,"journal":{"name":"2021 7th International Conference on Web Research (ICWR)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 7th International Conference on Web Research (ICWR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICWR51868.2021.9443113","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

Abstract

Clustering is an unsupervised classification method that focused on grouping data into clusters. The objects in each cluster are very similar but different from the objects in the other clusters. As clustering methods deal with the massive amount of information, many intelligent software agents have been widely utilized clustering techniques to filter, retrieve, and categorize documents that exist on the World Wide Web. Web mining is generally classified under data mining. In data mining, one of the significant clustering centroid-based partitioning methods is the K-Means algorithm. One of the K-Means algorithm's challenges is its extreme sensitivity to initial cluster centers' choice, which may yield get stuck in the local optimum if the initial centers are selected randomly. A variant of the K-Means method is the K-Means++ algorithm, which improves the algorithm's performance by smart choices of initialization of the cluster centroids. Evolutionary techniques, widely utilized for optimizing clustering methods by providing their prerequisite parameters. The Genetic Algorithm is stochastic and population-based, that applied in optimization problem-solving. This paper proposed a Genetic-based K-Means (GBKM) clustering algorithm where the clusters' centroids are encoded by chromosomes rather than random initial cluster centroids. The best cluster centers gave by the Genetic algorithm that maximizes the fitness function, as initial points of the K-Means algorithm. The results show this model helps increase the K-Means algorithm's performance by appropriate choice of initialization of the cluster centroids, compared to four other clustering algorithms.

查看原文本刊更多论文

一种新的基于遗传的k -均值聚类算法

聚类是一种无监督分类方法，其重点是将数据分组到簇中。每个集群中的对象非常相似，但与其他集群中的对象不同。由于聚类方法处理大量的信息，许多智能软件代理已经广泛使用聚类技术来过滤、检索和分类万维网上存在的文档。Web挖掘一般归为数据挖掘。在数据挖掘中，基于聚类质心的划分方法之一是K-Means算法。K-Means算法的挑战之一是对初始聚类中心的选择极其敏感，如果随机选择初始聚类中心，可能会陷入局部最优。K-Means方法的一种变体是k - means++算法，该算法通过智能选择初始化聚类质心来提高算法的性能。进化技术，广泛用于通过提供其先决条件参数来优化聚类方法。遗传算法是一种基于种群的随机算法，主要用于优化问题的求解。本文提出了一种基于遗传的K-Means聚类算法，该算法的聚类质心由染色体编码，而不是随机初始聚类质心。遗传算法给出的使适应度函数最大化的最佳聚类中心，作为K-Means算法的初始点。结果表明，与其他四种聚类算法相比，该模型通过适当选择聚类质心初始化，有助于提高K-Means算法的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2021 7th International Conference on Web Research (ICWR)

自引率

0.00%

发文量