{"title":"A Quality Metric for K-Means Clustering","authors":"M. Thulasidas","doi":"10.1109/FSKD.2018.8687210","DOIUrl":null,"url":null,"abstract":"From a teaching perspective, K-Means algorithm for clustering figures in the introductory courses in data analytics because of its conceptual simplicity. However, it suffers from a couple of drawbacks in terms of variable selection and the determination of the optimal number of clusters. In this paper, we present a new, mathematically defensible, quality metric for K-Means clustering based on the standard score of the distribution of the centroids. Furthermore, we demonstrate how this Standard Score Metric (SSM) can be used for automatic variable selection and optimal number of clusters using well-known data sets as well as real data collected locally.","PeriodicalId":235481,"journal":{"name":"2018 14th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery (ICNC-FSKD)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 14th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery (ICNC-FSKD)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/FSKD.2018.8687210","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
From a teaching perspective, K-Means algorithm for clustering figures in the introductory courses in data analytics because of its conceptual simplicity. However, it suffers from a couple of drawbacks in terms of variable selection and the determination of the optimal number of clusters. In this paper, we present a new, mathematically defensible, quality metric for K-Means clustering based on the standard score of the distribution of the centroids. Furthermore, we demonstrate how this Standard Score Metric (SSM) can be used for automatic variable selection and optimal number of clusters using well-known data sets as well as real data collected locally.