{"title":"A validity index method for clusters with different degrees of dispersion and overlap","authors":"P. Lin, P. Huang, Che-Yu Li","doi":"10.1109/ICACI.2016.7449829","DOIUrl":null,"url":null,"abstract":"Cluster validity index Is used for estimating the quality of partitions to a dataset by clustering algorithms, and finding the optimal number of clusters to be partitioned. In this paper, we propose a new validity index, which is based on a dispersion measure and an overlap measure. The dispersion measure estimates the overall data density of the clusters in the dataset; whereas the overlap measure estimates the degree of isolation among all clusters. Low degree of dispersion means that the overall clusters are densely distributed and hence are compact; and low degree of overlap means that clusters are overall well separated. Thus, a good clustering result is expected to have a lower dispersion measure and a lower overlap measure. We conducted several experiments to validate the effectiveness of our validity indexing method, including artificial datasets and public real datasets. Experimental results show that our validity indexing method has superior effectiveness and reliability for estimating the optimal number of clusters that widely differ in degrees of dispersion and overlap, when compared to nine other indices proposed in the literature.","PeriodicalId":211040,"journal":{"name":"2016 Eighth International Conference on Advanced Computational Intelligence (ICACI)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 Eighth International Conference on Advanced Computational Intelligence (ICACI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICACI.2016.7449829","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4
Abstract
Cluster validity index Is used for estimating the quality of partitions to a dataset by clustering algorithms, and finding the optimal number of clusters to be partitioned. In this paper, we propose a new validity index, which is based on a dispersion measure and an overlap measure. The dispersion measure estimates the overall data density of the clusters in the dataset; whereas the overlap measure estimates the degree of isolation among all clusters. Low degree of dispersion means that the overall clusters are densely distributed and hence are compact; and low degree of overlap means that clusters are overall well separated. Thus, a good clustering result is expected to have a lower dispersion measure and a lower overlap measure. We conducted several experiments to validate the effectiveness of our validity indexing method, including artificial datasets and public real datasets. Experimental results show that our validity indexing method has superior effectiveness and reliability for estimating the optimal number of clusters that widely differ in degrees of dispersion and overlap, when compared to nine other indices proposed in the literature.