{"title":"On Monotonic Tendency of Some Fuzzy Cluster Validity Indices for High-Dimensional Data","authors":"Fernanda Eustáquio, T. Nogueira","doi":"10.1109/BRACIS.2018.00102","DOIUrl":null,"url":null,"abstract":"Fuzzy clustering validation of high-dimensional data sets is only possible using a reliable cluster validity index. Therefore, the selection of an index is as important as choosing an appropriate clustering algorithm. A good validity index is that one that correctly recognize the data structure by choosing its correct number of clusters, and it is not sensitive to any parameter of the clustering algorithm or data property. However, some classical fuzzy validity indices as Partition Coefficient (PC), Partition Entropy (PE) and Fukuyama-Sugeno (FS) are sensitive to the fuzzification factor m and the number of clusters c, both parameters of the well-known Fuzzy c-Means (FCM) algorithm. They present the monotonic tendency in function of c even varying the values of m: the PC and FS values become smaller when c increases and the opposite occurs with PE. Although the literature presents extensive investigations about such tendency, they were conducted for low-dimensional data, in which such data property does not affect the clustering behavior. In order to investigate how such aspects affect the fuzzy clustering results of high-dimensional data, in this work we have clustered objects of ten real high-dimensional data sets, using FCM validated by PC, PE, FS and some proposed modifications of them to lead with the monotonic tendency. The results showed that the Modified Partition Coefficient (MPC) is the more reliable index to validate fuzzy clustering of high-dimensional data.","PeriodicalId":405190,"journal":{"name":"2018 7th Brazilian Conference on Intelligent Systems (BRACIS)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 7th Brazilian Conference on Intelligent Systems (BRACIS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/BRACIS.2018.00102","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5
Abstract
Fuzzy clustering validation of high-dimensional data sets is only possible using a reliable cluster validity index. Therefore, the selection of an index is as important as choosing an appropriate clustering algorithm. A good validity index is that one that correctly recognize the data structure by choosing its correct number of clusters, and it is not sensitive to any parameter of the clustering algorithm or data property. However, some classical fuzzy validity indices as Partition Coefficient (PC), Partition Entropy (PE) and Fukuyama-Sugeno (FS) are sensitive to the fuzzification factor m and the number of clusters c, both parameters of the well-known Fuzzy c-Means (FCM) algorithm. They present the monotonic tendency in function of c even varying the values of m: the PC and FS values become smaller when c increases and the opposite occurs with PE. Although the literature presents extensive investigations about such tendency, they were conducted for low-dimensional data, in which such data property does not affect the clustering behavior. In order to investigate how such aspects affect the fuzzy clustering results of high-dimensional data, in this work we have clustered objects of ten real high-dimensional data sets, using FCM validated by PC, PE, FS and some proposed modifications of them to lead with the monotonic tendency. The results showed that the Modified Partition Coefficient (MPC) is the more reliable index to validate fuzzy clustering of high-dimensional data.