A Comparison Study of Validity Indices on Swarm-Intelligence-Based Clustering.

IEEE Transactions on Systems Man and Cybernetics Part B-Cybernetics Pub Date : 2012-08-01 Epub Date: 2012-03-15 DOI:10.1109/TSMCB.2012.2188509

Rui Xu, Jie Xu, D C Wunsch

{"title":"A Comparison Study of Validity Indices on Swarm-Intelligence-Based Clustering.","authors":"Rui Xu, Jie Xu, D C Wunsch","doi":"10.1109/TSMCB.2012.2188509","DOIUrl":null,"url":null,"abstract":"<p><p>Swarm intelligence has emerged as a worthwhile class of clustering methods due to its convenient implementation, parallel capability, ability to avoid local minima, and other advantages. In such applications, clustering validity indices usually operate as fitness functions to evaluate the qualities of the obtained clusters. However, as the validity indices are usually data dependent and are designed to address certain types of data, the selection of different indices as the fitness functions may critically affect cluster quality. Here, we compare the performances of eight well-known and widely used clustering validity indices, namely, the Caliński-Harabasz index, the CS index, the Davies-Bouldin index, the Dunn index with two of its generalized versions, the I index, and the silhouette statistic index, on both synthetic and real data sets in the framework of differential-evolution-particle-swarm-optimization (DEPSO)-based clustering. DEPSO is a hybrid evolutionary algorithm of the stochastic optimization approach (differential evolution) and the swarm intelligence method (particle swarm optimization) that further increases the search capability and achieves higher flexibility in exploring the problem space. According to the experimental results, we find that the silhouette statistic index stands out in most of the data sets that we examined. Meanwhile, we suggest that users reach their conclusions not just based on only one index, but after considering the results of several indices to achieve reliable clustering structures. </p>","PeriodicalId":55006,"journal":{"name":"IEEE Transactions on Systems Man and Cybernetics Part B-Cybernetics","volume":" ","pages":"1243-56"},"PeriodicalIF":0.0000,"publicationDate":"2012-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TSMCB.2012.2188509","citationCount":"122","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Systems Man and Cybernetics Part B-Cybernetics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/TSMCB.2012.2188509","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2012/3/15 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 122

Abstract

Swarm intelligence has emerged as a worthwhile class of clustering methods due to its convenient implementation, parallel capability, ability to avoid local minima, and other advantages. In such applications, clustering validity indices usually operate as fitness functions to evaluate the qualities of the obtained clusters. However, as the validity indices are usually data dependent and are designed to address certain types of data, the selection of different indices as the fitness functions may critically affect cluster quality. Here, we compare the performances of eight well-known and widely used clustering validity indices, namely, the Caliński-Harabasz index, the CS index, the Davies-Bouldin index, the Dunn index with two of its generalized versions, the I index, and the silhouette statistic index, on both synthetic and real data sets in the framework of differential-evolution-particle-swarm-optimization (DEPSO)-based clustering. DEPSO is a hybrid evolutionary algorithm of the stochastic optimization approach (differential evolution) and the swarm intelligence method (particle swarm optimization) that further increases the search capability and achieves higher flexibility in exploring the problem space. According to the experimental results, we find that the silhouette statistic index stands out in most of the data sets that we examined. Meanwhile, we suggest that users reach their conclusions not just based on only one index, but after considering the results of several indices to achieve reliable clustering structures.

查看原文本刊更多论文

基于群体智能聚类的有效性指标比较研究。

群体智能由于其实现方便、并行能力、避免局部极小值等优点而成为一类有价值的聚类方法。在这类应用中，聚类有效性指标通常作为适应度函数来评价得到的聚类的质量。然而，由于有效性指标通常依赖于数据，并且设计用于处理特定类型的数据，因此选择不同的指标作为适应度函数可能会严重影响聚类质量。本文在基于差分进化-粒子群优化(DEPSO)的聚类框架下，比较了8种常用的聚类有效性指标(Caliński-Harabasz指数、CS指数、Davies-Bouldin指数、Dunn指数及其两种广义版本I指数和轮廓统计指数)在合成数据集和真实数据集上的表现。DEPSO是随机优化方法(差分进化)和群体智能方法(粒子群优化)的混合进化算法，进一步提高了搜索能力，在探索问题空间方面具有更高的灵活性。根据实验结果，我们发现轮廓统计指标在我们检查的大多数数据集中都很突出。同时，我们建议用户不要只根据一个指标得出结论，而是综合考虑多个指标的结果，以实现可靠的聚类结构。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Systems Man and Cybernetics Part B-Cybernetics 工程技术-计算机：控制论

自引率

0.00%

发文量

审稿时长

6.0 months