Jessica A. Carballido , Macarena A. Latini , Ignacio Ponzoni , Rocío L. Cecchini
{"title":"An Evolutionary Algorithm for Automatic Recommendation of Clustering Methods and its Parameters","authors":"Jessica A. Carballido , Macarena A. Latini , Ignacio Ponzoni , Rocío L. Cecchini","doi":"10.1016/j.endm.2018.07.030","DOIUrl":null,"url":null,"abstract":"<div><p>One of the main problems being faced at the time of performing data clustering consists in the deteremination of the best clustering method together with defining the ideal amount (k) of groups in which these data should be separated. In this paper, a preliminary approximation of a clustering recommender method is presented which, starting from a set of standardized data, suggests the best clustering strategy and also proposes an advisable k value. For this aim, the algorithm considers four indices for evaluating the final structure of clusters: Dunn, Silhouette, Widest Gap and Entropy. The prototype is implemented as a Genetic Algorithm in which individuals are possible configurations of the methods and their parameters. In this first prototype, the algorithm suggests between four partitioning methods namely K-means, PAM, CLARA and, Fanny. Also, the best set of parameters to execute the suggested method is obtained. The prototype was developed in an R environment, and its findings could be corroborated as consistent when compared with a combination of results provided by other methods with similar objectives. The idea of this prototype is to serve as the initial basis for a more complex framework that also incorporates the reduction of matrices with vast numbers of rows.</p></div>","PeriodicalId":35408,"journal":{"name":"Electronic Notes in Discrete Mathematics","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2018-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1016/j.endm.2018.07.030","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Electronic Notes in Discrete Mathematics","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1571065318301744","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"Mathematics","Score":null,"Total":0}
引用次数: 2
Abstract
One of the main problems being faced at the time of performing data clustering consists in the deteremination of the best clustering method together with defining the ideal amount (k) of groups in which these data should be separated. In this paper, a preliminary approximation of a clustering recommender method is presented which, starting from a set of standardized data, suggests the best clustering strategy and also proposes an advisable k value. For this aim, the algorithm considers four indices for evaluating the final structure of clusters: Dunn, Silhouette, Widest Gap and Entropy. The prototype is implemented as a Genetic Algorithm in which individuals are possible configurations of the methods and their parameters. In this first prototype, the algorithm suggests between four partitioning methods namely K-means, PAM, CLARA and, Fanny. Also, the best set of parameters to execute the suggested method is obtained. The prototype was developed in an R environment, and its findings could be corroborated as consistent when compared with a combination of results provided by other methods with similar objectives. The idea of this prototype is to serve as the initial basis for a more complex framework that also incorporates the reduction of matrices with vast numbers of rows.
期刊介绍:
Electronic Notes in Discrete Mathematics is a venue for the rapid electronic publication of the proceedings of conferences, of lecture notes, monographs and other similar material for which quick publication is appropriate. Organizers of conferences whose proceedings appear in Electronic Notes in Discrete Mathematics, and authors of other material appearing as a volume in the series are allowed to make hard copies of the relevant volume for limited distribution. For example, conference proceedings may be distributed to participants at the meeting, and lecture notes can be distributed to those taking a course based on the material in the volume.