{"title":"最小的描述长度和聚类与范例","authors":"Po-Hsiang Lai, J. O’Sullivan, Robert Pless","doi":"10.1109/ISIT.2009.5205937","DOIUrl":null,"url":null,"abstract":"We propose an information-theoretic clustering framework for density-based clustering and similarity or distance-based clustering with objective functions of clustering performance derived from stochastic complexity and minimum description length (MDL) arguments. Under this framework, the number of clusters and parameters can be determined in a principled way without prior knowledge from users. We show that similarity-based clustering can be viewed as combinatorial optimization on graphs. We propose two clustering algorithms, one of which relies on a minimum arborescence tree algorithm which returns optimal clustering under the proposed MDL objective function for similarity-based clustering. We demonstrate clustering performance on synthetic data.","PeriodicalId":412925,"journal":{"name":"2009 IEEE International Symposium on Information Theory","volume":"49 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"Minimum description length and clustering with exemplars\",\"authors\":\"Po-Hsiang Lai, J. O’Sullivan, Robert Pless\",\"doi\":\"10.1109/ISIT.2009.5205937\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We propose an information-theoretic clustering framework for density-based clustering and similarity or distance-based clustering with objective functions of clustering performance derived from stochastic complexity and minimum description length (MDL) arguments. Under this framework, the number of clusters and parameters can be determined in a principled way without prior knowledge from users. We show that similarity-based clustering can be viewed as combinatorial optimization on graphs. We propose two clustering algorithms, one of which relies on a minimum arborescence tree algorithm which returns optimal clustering under the proposed MDL objective function for similarity-based clustering. We demonstrate clustering performance on synthetic data.\",\"PeriodicalId\":412925,\"journal\":{\"name\":\"2009 IEEE International Symposium on Information Theory\",\"volume\":\"49 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2009-06-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2009 IEEE International Symposium on Information Theory\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISIT.2009.5205937\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2009 IEEE International Symposium on Information Theory","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISIT.2009.5205937","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Minimum description length and clustering with exemplars
We propose an information-theoretic clustering framework for density-based clustering and similarity or distance-based clustering with objective functions of clustering performance derived from stochastic complexity and minimum description length (MDL) arguments. Under this framework, the number of clusters and parameters can be determined in a principled way without prior knowledge from users. We show that similarity-based clustering can be viewed as combinatorial optimization on graphs. We propose two clustering algorithms, one of which relies on a minimum arborescence tree algorithm which returns optimal clustering under the proposed MDL objective function for similarity-based clustering. We demonstrate clustering performance on synthetic data.