{"title":"基于模糊制粒-脱粒准则的聚类有效性新指标","authors":"S. Saha, S. Bandyopadhyay","doi":"10.1109/adcom.2007.19","DOIUrl":null,"url":null,"abstract":"Identification of correct number of clusters and the corresponding partitioning are two important considerations in clustering. In this paper, a new fuzzy quantization-dequantization criterion is used to propose a cluster validity index named fuzzy vector quantization based validity index, FVQ index. This index identifies how well the formed cluster centers represent that particular data set. In general, most of the existing validity indices try to optimize the total variance of the partitioning which is a measure of compactness of the clusters so formed. Here a new kind of error function which reflects how well the formed cluster centers represent the whole data set is used as the goodness of the obtained partitioning. This error function is monotonically decreasing with increase in the number of clusters. Minimum separation between two cluster centers is used here to normalize the error function. The well-known genetic algorithm based K-means clustering algorithm (GAK-means) is used as the underlying partitioning technique. The number of clusters is varied from 2 to radicN where N is the total number of data points present in the data set and the values of the proposed validity index is noted down. The minimum value of the FVQ index over these radicN-1 partitions corresponds to the appropriate partitioning and the number of partitions as indicated by the validity index. Results on five artificially generated and three real-life data sets show the effectiveness of the proposed validity index. For the purpose of comparison the cluster number identified by a well-known cluster validity index, XB-index, for the above mentioned eight data sets are also reported.","PeriodicalId":185608,"journal":{"name":"15th International Conference on Advanced Computing and Communications (ADCOM 2007)","volume":"51 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"15","resultStr":"{\"title\":\"A New Cluster Validity Index Based on Fuzzy Granulation-degranulation Criteria\",\"authors\":\"S. Saha, S. Bandyopadhyay\",\"doi\":\"10.1109/adcom.2007.19\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Identification of correct number of clusters and the corresponding partitioning are two important considerations in clustering. In this paper, a new fuzzy quantization-dequantization criterion is used to propose a cluster validity index named fuzzy vector quantization based validity index, FVQ index. This index identifies how well the formed cluster centers represent that particular data set. In general, most of the existing validity indices try to optimize the total variance of the partitioning which is a measure of compactness of the clusters so formed. Here a new kind of error function which reflects how well the formed cluster centers represent the whole data set is used as the goodness of the obtained partitioning. This error function is monotonically decreasing with increase in the number of clusters. Minimum separation between two cluster centers is used here to normalize the error function. The well-known genetic algorithm based K-means clustering algorithm (GAK-means) is used as the underlying partitioning technique. The number of clusters is varied from 2 to radicN where N is the total number of data points present in the data set and the values of the proposed validity index is noted down. The minimum value of the FVQ index over these radicN-1 partitions corresponds to the appropriate partitioning and the number of partitions as indicated by the validity index. Results on five artificially generated and three real-life data sets show the effectiveness of the proposed validity index. For the purpose of comparison the cluster number identified by a well-known cluster validity index, XB-index, for the above mentioned eight data sets are also reported.\",\"PeriodicalId\":185608,\"journal\":{\"name\":\"15th International Conference on Advanced Computing and Communications (ADCOM 2007)\",\"volume\":\"51 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1900-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"15\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"15th International Conference on Advanced Computing and Communications (ADCOM 2007)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/adcom.2007.19\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"15th International Conference on Advanced Computing and Communications (ADCOM 2007)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/adcom.2007.19","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A New Cluster Validity Index Based on Fuzzy Granulation-degranulation Criteria
Identification of correct number of clusters and the corresponding partitioning are two important considerations in clustering. In this paper, a new fuzzy quantization-dequantization criterion is used to propose a cluster validity index named fuzzy vector quantization based validity index, FVQ index. This index identifies how well the formed cluster centers represent that particular data set. In general, most of the existing validity indices try to optimize the total variance of the partitioning which is a measure of compactness of the clusters so formed. Here a new kind of error function which reflects how well the formed cluster centers represent the whole data set is used as the goodness of the obtained partitioning. This error function is monotonically decreasing with increase in the number of clusters. Minimum separation between two cluster centers is used here to normalize the error function. The well-known genetic algorithm based K-means clustering algorithm (GAK-means) is used as the underlying partitioning technique. The number of clusters is varied from 2 to radicN where N is the total number of data points present in the data set and the values of the proposed validity index is noted down. The minimum value of the FVQ index over these radicN-1 partitions corresponds to the appropriate partitioning and the number of partitions as indicated by the validity index. Results on five artificially generated and three real-life data sets show the effectiveness of the proposed validity index. For the purpose of comparison the cluster number identified by a well-known cluster validity index, XB-index, for the above mentioned eight data sets are also reported.