基于模糊制粒-脱粒准则的聚类有效性新指标

15th International Conference on Advanced Computing and Communications (ADCOM 2007) Pub Date : 1900-01-01 DOI:10.1109/adcom.2007.19

S. Saha, S. Bandyopadhyay

{"title":"基于模糊制粒-脱粒准则的聚类有效性新指标","authors":"S. Saha, S. Bandyopadhyay","doi":"10.1109/adcom.2007.19","DOIUrl":null,"url":null,"abstract":"Identification of correct number of clusters and the corresponding partitioning are two important considerations in clustering. In this paper, a new fuzzy quantization-dequantization criterion is used to propose a cluster validity index named fuzzy vector quantization based validity index, FVQ index. This index identifies how well the formed cluster centers represent that particular data set. In general, most of the existing validity indices try to optimize the total variance of the partitioning which is a measure of compactness of the clusters so formed. Here a new kind of error function which reflects how well the formed cluster centers represent the whole data set is used as the goodness of the obtained partitioning. This error function is monotonically decreasing with increase in the number of clusters. Minimum separation between two cluster centers is used here to normalize the error function. The well-known genetic algorithm based K-means clustering algorithm (GAK-means) is used as the underlying partitioning technique. The number of clusters is varied from 2 to radicN where N is the total number of data points present in the data set and the values of the proposed validity index is noted down. The minimum value of the FVQ index over these radicN-1 partitions corresponds to the appropriate partitioning and the number of partitions as indicated by the validity index. Results on five artificially generated and three real-life data sets show the effectiveness of the proposed validity index. For the purpose of comparison the cluster number identified by a well-known cluster validity index, XB-index, for the above mentioned eight data sets are also reported.","PeriodicalId":185608,"journal":{"name":"15th International Conference on Advanced Computing and Communications (ADCOM 2007)","volume":"51 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"15","resultStr":"{\"title\":\"A New Cluster Validity Index Based on Fuzzy Granulation-degranulation Criteria\",\"authors\":\"S. Saha, S. Bandyopadhyay\",\"doi\":\"10.1109/adcom.2007.19\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Identification of correct number of clusters and the corresponding partitioning are two important considerations in clustering. In this paper, a new fuzzy quantization-dequantization criterion is used to propose a cluster validity index named fuzzy vector quantization based validity index, FVQ index. This index identifies how well the formed cluster centers represent that particular data set. In general, most of the existing validity indices try to optimize the total variance of the partitioning which is a measure of compactness of the clusters so formed. Here a new kind of error function which reflects how well the formed cluster centers represent the whole data set is used as the goodness of the obtained partitioning. This error function is monotonically decreasing with increase in the number of clusters. Minimum separation between two cluster centers is used here to normalize the error function. The well-known genetic algorithm based K-means clustering algorithm (GAK-means) is used as the underlying partitioning technique. The number of clusters is varied from 2 to radicN where N is the total number of data points present in the data set and the values of the proposed validity index is noted down. The minimum value of the FVQ index over these radicN-1 partitions corresponds to the appropriate partitioning and the number of partitions as indicated by the validity index. Results on five artificially generated and three real-life data sets show the effectiveness of the proposed validity index. For the purpose of comparison the cluster number identified by a well-known cluster validity index, XB-index, for the above mentioned eight data sets are also reported.\",\"PeriodicalId\":185608,\"journal\":{\"name\":\"15th International Conference on Advanced Computing and Communications (ADCOM 2007)\",\"volume\":\"51 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1900-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"15\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"15th International Conference on Advanced Computing and Communications (ADCOM 2007)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/adcom.2007.19\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"15th International Conference on Advanced Computing and Communications (ADCOM 2007)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/adcom.2007.19","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 15

摘要

识别正确的簇数并进行相应的划分是聚类中两个重要的考虑因素。本文采用一种新的模糊量化-去量化准则，提出了一种聚类有效性指标——基于模糊矢量量化的有效性指标FVQ指标。该索引确定形成的群集中心如何很好地代表特定的数据集。一般来说，大多数现有的有效性指标都试图优化划分的总方差，这是一个衡量聚类紧凑性的指标。这里使用了一种新的误差函数，它反映了所形成的聚类中心对整个数据集的代表程度。该误差函数随聚类数量的增加而单调减小。这里使用两个聚类中心之间的最小间隔来归一化误差函数。基于遗传算法的k -均值聚类算法(GAK-means)被用作底层分区技术。聚类的数量从2到根号N不等，其中N是数据集中存在的数据点的总数，建议的有效性指数的值被记录下来。这些radicN-1分区上的FVQ索引的最小值对应于有效性索引所指示的适当分区和分区数量。五个人工生成的数据集和三个实际数据集的结果表明了所提出的效度指标的有效性。为了比较，本文还报告了上述8个数据集的聚类数，该聚类数由众所周知的聚类有效性指数XB-index标识。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A New Cluster Validity Index Based on Fuzzy Granulation-degranulation Criteria

Identification of correct number of clusters and the corresponding partitioning are two important considerations in clustering. In this paper, a new fuzzy quantization-dequantization criterion is used to propose a cluster validity index named fuzzy vector quantization based validity index, FVQ index. This index identifies how well the formed cluster centers represent that particular data set. In general, most of the existing validity indices try to optimize the total variance of the partitioning which is a measure of compactness of the clusters so formed. Here a new kind of error function which reflects how well the formed cluster centers represent the whole data set is used as the goodness of the obtained partitioning. This error function is monotonically decreasing with increase in the number of clusters. Minimum separation between two cluster centers is used here to normalize the error function. The well-known genetic algorithm based K-means clustering algorithm (GAK-means) is used as the underlying partitioning technique. The number of clusters is varied from 2 to radicN where N is the total number of data points present in the data set and the values of the proposed validity index is noted down. The minimum value of the FVQ index over these radicN-1 partitions corresponds to the appropriate partitioning and the number of partitions as indicated by the validity index. Results on five artificially generated and three real-life data sets show the effectiveness of the proposed validity index. For the purpose of comparison the cluster number identified by a well-known cluster validity index, XB-index, for the above mentioned eight data sets are also reported.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

15th International Conference on Advanced Computing and Communications (ADCOM 2007)

自引率

0.00%

发文量