最小的描述长度和聚类与范例

2009 IEEE International Symposium on Information Theory Pub Date : 2009-06-28 DOI:10.1109/ISIT.2009.5205937

Po-Hsiang Lai, J. O’Sullivan, Robert Pless

{"title":"最小的描述长度和聚类与范例","authors":"Po-Hsiang Lai, J. O’Sullivan, Robert Pless","doi":"10.1109/ISIT.2009.5205937","DOIUrl":null,"url":null,"abstract":"We propose an information-theoretic clustering framework for density-based clustering and similarity or distance-based clustering with objective functions of clustering performance derived from stochastic complexity and minimum description length (MDL) arguments. Under this framework, the number of clusters and parameters can be determined in a principled way without prior knowledge from users. We show that similarity-based clustering can be viewed as combinatorial optimization on graphs. We propose two clustering algorithms, one of which relies on a minimum arborescence tree algorithm which returns optimal clustering under the proposed MDL objective function for similarity-based clustering. We demonstrate clustering performance on synthetic data.","PeriodicalId":412925,"journal":{"name":"2009 IEEE International Symposium on Information Theory","volume":"49 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"Minimum description length and clustering with exemplars\",\"authors\":\"Po-Hsiang Lai, J. O’Sullivan, Robert Pless\",\"doi\":\"10.1109/ISIT.2009.5205937\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We propose an information-theoretic clustering framework for density-based clustering and similarity or distance-based clustering with objective functions of clustering performance derived from stochastic complexity and minimum description length (MDL) arguments. Under this framework, the number of clusters and parameters can be determined in a principled way without prior knowledge from users. We show that similarity-based clustering can be viewed as combinatorial optimization on graphs. We propose two clustering algorithms, one of which relies on a minimum arborescence tree algorithm which returns optimal clustering under the proposed MDL objective function for similarity-based clustering. We demonstrate clustering performance on synthetic data.\",\"PeriodicalId\":412925,\"journal\":{\"name\":\"2009 IEEE International Symposium on Information Theory\",\"volume\":\"49 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2009-06-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2009 IEEE International Symposium on Information Theory\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISIT.2009.5205937\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2009 IEEE International Symposium on Information Theory","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISIT.2009.5205937","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 5

摘要

我们提出了一种基于信息理论的聚类框架，用于基于密度的聚类和基于相似性或基于距离的聚类，其聚类性能的目标函数来源于随机复杂性和最小描述长度(MDL)参数。在此框架下，可以在不需要用户先验知识的情况下，以原则性的方式确定聚类的数量和参数。我们证明了基于相似性的聚类可以被看作是图的组合优化。我们提出了两种聚类算法，其中一种算法依赖于最小树形树算法，该算法在提出的MDL目标函数下返回最优聚类，用于基于相似性的聚类。我们演示了在合成数据上的聚类性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Minimum description length and clustering with exemplars

We propose an information-theoretic clustering framework for density-based clustering and similarity or distance-based clustering with objective functions of clustering performance derived from stochastic complexity and minimum description length (MDL) arguments. Under this framework, the number of clusters and parameters can be determined in a principled way without prior knowledge from users. We show that similarity-based clustering can be viewed as combinatorial optimization on graphs. We propose two clustering algorithms, one of which relies on a minimum arborescence tree algorithm which returns optimal clustering under the proposed MDL objective function for similarity-based clustering. We demonstrate clustering performance on synthetic data.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2009 IEEE International Symposium on Information Theory

自引率

0.00%

发文量