{"title":"On the optimality of k-means clustering","authors":"Lori A. Dalton","doi":"10.1109/GENSIPS.2013.6735934","DOIUrl":null,"url":null,"abstract":"Although it is typically accepted that cluster analysis is a subjective activity, without an objective framework it is impossible to understand, let alone guarantee, the predictive capacity of clustering. To address this, recent work utilizes random point process theory to develop a probabilistic theory of clustering. The theory fully parallels Bayes decision theory for classification: given a known underlying processes and specified cost function there exist Bayes clustering operators with minimum expected error. Clustering is hence transformed from a subjective activity to an objective operation. In this work, we present conditions under which the optimization function utilized in classical k-means clustering is optimal in the new Bayes clustering theory, and thus begin to understand this algorithm objectively.","PeriodicalId":336511,"journal":{"name":"2013 IEEE International Workshop on Genomic Signal Processing and Statistics","volume":"19 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 IEEE International Workshop on Genomic Signal Processing and Statistics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/GENSIPS.2013.6735934","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4
Abstract
Although it is typically accepted that cluster analysis is a subjective activity, without an objective framework it is impossible to understand, let alone guarantee, the predictive capacity of clustering. To address this, recent work utilizes random point process theory to develop a probabilistic theory of clustering. The theory fully parallels Bayes decision theory for classification: given a known underlying processes and specified cost function there exist Bayes clustering operators with minimum expected error. Clustering is hence transformed from a subjective activity to an objective operation. In this work, we present conditions under which the optimization function utilized in classical k-means clustering is optimal in the new Bayes clustering theory, and thus begin to understand this algorithm objectively.