一种自动推荐聚类方法及其参数的进化算法

Q2 Mathematics
Jessica A. Carballido , Macarena A. Latini , Ignacio Ponzoni , Rocío L. Cecchini
{"title":"一种自动推荐聚类方法及其参数的进化算法","authors":"Jessica A. Carballido ,&nbsp;Macarena A. Latini ,&nbsp;Ignacio Ponzoni ,&nbsp;Rocío L. Cecchini","doi":"10.1016/j.endm.2018.07.030","DOIUrl":null,"url":null,"abstract":"<div><p>One of the main problems being faced at the time of performing data clustering consists in the deteremination of the best clustering method together with defining the ideal amount (k) of groups in which these data should be separated. In this paper, a preliminary approximation of a clustering recommender method is presented which, starting from a set of standardized data, suggests the best clustering strategy and also proposes an advisable k value. For this aim, the algorithm considers four indices for evaluating the final structure of clusters: Dunn, Silhouette, Widest Gap and Entropy. The prototype is implemented as a Genetic Algorithm in which individuals are possible configurations of the methods and their parameters. In this first prototype, the algorithm suggests between four partitioning methods namely K-means, PAM, CLARA and, Fanny. Also, the best set of parameters to execute the suggested method is obtained. The prototype was developed in an R environment, and its findings could be corroborated as consistent when compared with a combination of results provided by other methods with similar objectives. The idea of this prototype is to serve as the initial basis for a more complex framework that also incorporates the reduction of matrices with vast numbers of rows.</p></div>","PeriodicalId":35408,"journal":{"name":"Electronic Notes in Discrete Mathematics","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2018-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1016/j.endm.2018.07.030","citationCount":"2","resultStr":"{\"title\":\"An Evolutionary Algorithm for Automatic Recommendation of Clustering Methods and its Parameters\",\"authors\":\"Jessica A. Carballido ,&nbsp;Macarena A. Latini ,&nbsp;Ignacio Ponzoni ,&nbsp;Rocío L. Cecchini\",\"doi\":\"10.1016/j.endm.2018.07.030\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>One of the main problems being faced at the time of performing data clustering consists in the deteremination of the best clustering method together with defining the ideal amount (k) of groups in which these data should be separated. In this paper, a preliminary approximation of a clustering recommender method is presented which, starting from a set of standardized data, suggests the best clustering strategy and also proposes an advisable k value. For this aim, the algorithm considers four indices for evaluating the final structure of clusters: Dunn, Silhouette, Widest Gap and Entropy. The prototype is implemented as a Genetic Algorithm in which individuals are possible configurations of the methods and their parameters. In this first prototype, the algorithm suggests between four partitioning methods namely K-means, PAM, CLARA and, Fanny. Also, the best set of parameters to execute the suggested method is obtained. The prototype was developed in an R environment, and its findings could be corroborated as consistent when compared with a combination of results provided by other methods with similar objectives. The idea of this prototype is to serve as the initial basis for a more complex framework that also incorporates the reduction of matrices with vast numbers of rows.</p></div>\",\"PeriodicalId\":35408,\"journal\":{\"name\":\"Electronic Notes in Discrete Mathematics\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://sci-hub-pdf.com/10.1016/j.endm.2018.07.030\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Electronic Notes in Discrete Mathematics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1571065318301744\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"Mathematics\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Electronic Notes in Discrete Mathematics","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1571065318301744","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"Mathematics","Score":null,"Total":0}
引用次数: 2

摘要

在执行数据聚类时面临的主要问题之一是确定最佳聚类方法以及定义这些数据应该分离的理想数量(k)组。本文提出了一种聚类推荐方法的初步近似,该方法从一组标准化数据出发,提出了最佳聚类策略,并提出了一个可取的k值。为此,该算法考虑了四个指标来评估聚类的最终结构:Dunn、Silhouette、最宽间隙和熵。原型是作为遗传算法实现的,其中个体是方法及其参数的可能配置。在第一个原型中,算法在K-means、PAM、CLARA和Fanny四种划分方法之间提出。此外,还获得了执行所建议方法的最佳参数集。该原型是在R环境中开发的,与具有类似目标的其他方法提供的结果组合相比,其结果可以证实为一致。这个原型的想法是作为一个更复杂的框架的初始基础,这个框架还包含包含大量行的矩阵的约简。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
An Evolutionary Algorithm for Automatic Recommendation of Clustering Methods and its Parameters

One of the main problems being faced at the time of performing data clustering consists in the deteremination of the best clustering method together with defining the ideal amount (k) of groups in which these data should be separated. In this paper, a preliminary approximation of a clustering recommender method is presented which, starting from a set of standardized data, suggests the best clustering strategy and also proposes an advisable k value. For this aim, the algorithm considers four indices for evaluating the final structure of clusters: Dunn, Silhouette, Widest Gap and Entropy. The prototype is implemented as a Genetic Algorithm in which individuals are possible configurations of the methods and their parameters. In this first prototype, the algorithm suggests between four partitioning methods namely K-means, PAM, CLARA and, Fanny. Also, the best set of parameters to execute the suggested method is obtained. The prototype was developed in an R environment, and its findings could be corroborated as consistent when compared with a combination of results provided by other methods with similar objectives. The idea of this prototype is to serve as the initial basis for a more complex framework that also incorporates the reduction of matrices with vast numbers of rows.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Electronic Notes in Discrete Mathematics
Electronic Notes in Discrete Mathematics Mathematics-Discrete Mathematics and Combinatorics
CiteScore
1.30
自引率
0.00%
发文量
0
期刊介绍: Electronic Notes in Discrete Mathematics is a venue for the rapid electronic publication of the proceedings of conferences, of lecture notes, monographs and other similar material for which quick publication is appropriate. Organizers of conferences whose proceedings appear in Electronic Notes in Discrete Mathematics, and authors of other material appearing as a volume in the series are allowed to make hard copies of the relevant volume for limited distribution. For example, conference proceedings may be distributed to participants at the meeting, and lecture notes can be distributed to those taking a course based on the material in the volume.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信