COMPARATIVE STUDY OF MUTATION OPERATORS IN THE GENETIC ALGORITHMS FOR THE K-MEANS PROBLEM

IF 0.3 Q3 MATHEMATICS

Facta Universitatis-Series Mathematics and Informatics Pub Date : 2021-02-02 DOI:10.22190/FUMI2004091L

Ri-Zhi Li, L. Kazakovtsev

{"title":"COMPARATIVE STUDY OF MUTATION OPERATORS IN THE GENETIC ALGORITHMS FOR THE K-MEANS PROBLEM","authors":"Ri-Zhi Li, L. Kazakovtsev","doi":"10.22190/FUMI2004091L","DOIUrl":null,"url":null,"abstract":"The k-means problem and the algorithm of the same name are the most commonly used clustering model and algorithm. Being a local search optimization method, the k-means algorithm falls to a local minimum of the objective function (sum of squared errors) and depends on the initial solution which is given or selected randomly. This disadvantage of the algorithm can be avoided by combining this algorithm with more sophisticated methods such as the Variable Neighborhood Search, agglomerative or dissociative heuristic approaches, the genetic algorithms, etc. Aiming at the shortcomings of the k-means algorithm and combining the advantages of the k-means algorithm and rvolutionary approack, a genetic clustering algorithm with the cross-mutation operator was designed. The eﬃciency of the genetic algorithms with the tournament selection, one-point crossover and various mutation operators (without any mutation operator, with the uniform mutation, DBM mutation and new cross-mutation) are compared on the data sets up to 2 millions of data vectors. We used data from the UCI repository and special data set collected during the testing of the highly reliable semiconductor components. In this paper, we do not discuss the comparative eﬃciency of the genetic algorithms for the k-means problem in comparison with the other (non-genetic) algorithms as well as the comparative adequacy of the k-means clustering model. Here, we focus on the inﬂuence of various mutation operators on the eﬃciency of the genetic algorithms only.","PeriodicalId":54148,"journal":{"name":"Facta Universitatis-Series Mathematics and Informatics","volume":"1 1","pages":""},"PeriodicalIF":0.3000,"publicationDate":"2021-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Facta Universitatis-Series Mathematics and Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.22190/FUMI2004091L","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"MATHEMATICS","Score":null,"Total":0}

引用次数: 0

Abstract

The k-means problem and the algorithm of the same name are the most commonly used clustering model and algorithm. Being a local search optimization method, the k-means algorithm falls to a local minimum of the objective function (sum of squared errors) and depends on the initial solution which is given or selected randomly. This disadvantage of the algorithm can be avoided by combining this algorithm with more sophisticated methods such as the Variable Neighborhood Search, agglomerative or dissociative heuristic approaches, the genetic algorithms, etc. Aiming at the shortcomings of the k-means algorithm and combining the advantages of the k-means algorithm and rvolutionary approack, a genetic clustering algorithm with the cross-mutation operator was designed. The eﬃciency of the genetic algorithms with the tournament selection, one-point crossover and various mutation operators (without any mutation operator, with the uniform mutation, DBM mutation and new cross-mutation) are compared on the data sets up to 2 millions of data vectors. We used data from the UCI repository and special data set collected during the testing of the highly reliable semiconductor components. In this paper, we do not discuss the comparative eﬃciency of the genetic algorithms for the k-means problem in comparison with the other (non-genetic) algorithms as well as the comparative adequacy of the k-means clustering model. Here, we focus on the inﬂuence of various mutation operators on the eﬃciency of the genetic algorithms only.

查看原文本刊更多论文

k -均值问题遗传算法中变异算子的比较研究

k-means问题和同名算法是最常用的聚类模型和算法。k-means算法是一种局部搜索优化方法，它落在目标函数(误差平方和)的局部最小值上，依赖于随机给定或选择的初始解。通过将该算法与更复杂的方法(如可变邻域搜索、聚集或解离启发式方法、遗传算法等)相结合，可以避免该算法的这一缺点。针对k-means算法的不足，结合k-means算法和进化算法的优点，设计了一种带有交叉变异算子的遗传聚类算法。在多达200万个数据向量的数据集上，比较了竞赛选择、一点交叉和各种变异算子(无变异算子、均匀变异、DBM变异和新交叉变异)的遗传算法的效率。我们使用了来自UCI存储库的数据和在高可靠性半导体组件测试期间收集的特殊数据集。在本文中，我们没有讨论k-means问题的遗传算法与其他(非遗传)算法的比较效率，也没有讨论k-means聚类模型的比较充分性。这里，我们只关注各种变异算子对遗传算法效率的影响。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Facta Universitatis-Series Mathematics and Informatics MATHEMATICS-

自引率

0.00%

发文量