{"title":"Minimum description length and clustering with exemplars","authors":"Po-Hsiang Lai, J. O’Sullivan, Robert Pless","doi":"10.1109/ISIT.2009.5205937","DOIUrl":null,"url":null,"abstract":"We propose an information-theoretic clustering framework for density-based clustering and similarity or distance-based clustering with objective functions of clustering performance derived from stochastic complexity and minimum description length (MDL) arguments. Under this framework, the number of clusters and parameters can be determined in a principled way without prior knowledge from users. We show that similarity-based clustering can be viewed as combinatorial optimization on graphs. We propose two clustering algorithms, one of which relies on a minimum arborescence tree algorithm which returns optimal clustering under the proposed MDL objective function for similarity-based clustering. We demonstrate clustering performance on synthetic data.","PeriodicalId":412925,"journal":{"name":"2009 IEEE International Symposium on Information Theory","volume":"49 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2009 IEEE International Symposium on Information Theory","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISIT.2009.5205937","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5
Abstract
We propose an information-theoretic clustering framework for density-based clustering and similarity or distance-based clustering with objective functions of clustering performance derived from stochastic complexity and minimum description length (MDL) arguments. Under this framework, the number of clusters and parameters can be determined in a principled way without prior knowledge from users. We show that similarity-based clustering can be viewed as combinatorial optimization on graphs. We propose two clustering algorithms, one of which relies on a minimum arborescence tree algorithm which returns optimal clustering under the proposed MDL objective function for similarity-based clustering. We demonstrate clustering performance on synthetic data.