On Monotonic Tendency of Some Fuzzy Cluster Validity Indices for High-Dimensional Data

2018 7th Brazilian Conference on Intelligent Systems (BRACIS) Pub Date : 2018-10-01 DOI:10.1109/BRACIS.2018.00102

Fernanda Eustáquio, T. Nogueira

{"title":"On Monotonic Tendency of Some Fuzzy Cluster Validity Indices for High-Dimensional Data","authors":"Fernanda Eustáquio, T. Nogueira","doi":"10.1109/BRACIS.2018.00102","DOIUrl":null,"url":null,"abstract":"Fuzzy clustering validation of high-dimensional data sets is only possible using a reliable cluster validity index. Therefore, the selection of an index is as important as choosing an appropriate clustering algorithm. A good validity index is that one that correctly recognize the data structure by choosing its correct number of clusters, and it is not sensitive to any parameter of the clustering algorithm or data property. However, some classical fuzzy validity indices as Partition Coefficient (PC), Partition Entropy (PE) and Fukuyama-Sugeno (FS) are sensitive to the fuzzification factor m and the number of clusters c, both parameters of the well-known Fuzzy c-Means (FCM) algorithm. They present the monotonic tendency in function of c even varying the values of m: the PC and FS values become smaller when c increases and the opposite occurs with PE. Although the literature presents extensive investigations about such tendency, they were conducted for low-dimensional data, in which such data property does not affect the clustering behavior. In order to investigate how such aspects affect the fuzzy clustering results of high-dimensional data, in this work we have clustered objects of ten real high-dimensional data sets, using FCM validated by PC, PE, FS and some proposed modifications of them to lead with the monotonic tendency. The results showed that the Modified Partition Coefficient (MPC) is the more reliable index to validate fuzzy clustering of high-dimensional data.","PeriodicalId":405190,"journal":{"name":"2018 7th Brazilian Conference on Intelligent Systems (BRACIS)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 7th Brazilian Conference on Intelligent Systems (BRACIS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/BRACIS.2018.00102","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 5

Abstract

Fuzzy clustering validation of high-dimensional data sets is only possible using a reliable cluster validity index. Therefore, the selection of an index is as important as choosing an appropriate clustering algorithm. A good validity index is that one that correctly recognize the data structure by choosing its correct number of clusters, and it is not sensitive to any parameter of the clustering algorithm or data property. However, some classical fuzzy validity indices as Partition Coefficient (PC), Partition Entropy (PE) and Fukuyama-Sugeno (FS) are sensitive to the fuzzification factor m and the number of clusters c, both parameters of the well-known Fuzzy c-Means (FCM) algorithm. They present the monotonic tendency in function of c even varying the values of m: the PC and FS values become smaller when c increases and the opposite occurs with PE. Although the literature presents extensive investigations about such tendency, they were conducted for low-dimensional data, in which such data property does not affect the clustering behavior. In order to investigate how such aspects affect the fuzzy clustering results of high-dimensional data, in this work we have clustered objects of ten real high-dimensional data sets, using FCM validated by PC, PE, FS and some proposed modifications of them to lead with the monotonic tendency. The results showed that the Modified Partition Coefficient (MPC) is the more reliable index to validate fuzzy clustering of high-dimensional data.

查看原文本刊更多论文

高维数据若干模糊聚类有效性指标的单调倾向

高维数据集的模糊聚类验证只有使用可靠的聚类有效性指标才能实现。因此，选择索引与选择合适的聚类算法同样重要。良好的有效性指标是指能够通过选择正确的聚类个数来正确识别数据结构，并且对聚类算法的任何参数或数据属性都不敏感的有效性指标。然而，一些经典的模糊有效性指标，如分割系数(PC)、分割熵(PE)和Fukuyama-Sugeno (FS)，对模糊化因子m和聚类数量c都很敏感，这两个参数都是著名的模糊c均值(FCM)算法的参数。它们对c的函数表现出单调的趋势，即使改变m的值，PC和FS值也随着c的增大而变小，而PE则相反。虽然文献对这种倾向进行了广泛的调查，但它们是针对低维数据进行的，其中这种数据属性不会影响聚类行为。为了研究这些方面是如何影响高维数据的模糊聚类结果的，在这项工作中，我们对10个真实高维数据集的对象进行了聚类，使用了经过PC、PE、FS验证的FCM以及对它们进行的一些修改来引导单调趋势。结果表明，修正分割系数(MPC)是验证高维数据模糊聚类的较为可靠的指标。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2018 7th Brazilian Conference on Intelligent Systems (BRACIS)

自引率

0.00%

发文量