二值矩阵的行和列联合聚类:算法和权衡

Measurement and Modeling of Computer Systems Pub Date : 2013-10-01 DOI:10.1145/2591971.2592005

Jiaming Xu, Rui Wu, Kai Zhu, B. Hajek, R. Srikant, Lei Ying

{"title":"二值矩阵的行和列联合聚类:算法和权衡","authors":"Jiaming Xu, Rui Wu, Kai Zhu, B. Hajek, R. Srikant, Lei Ying","doi":"10.1145/2591971.2592005","DOIUrl":null,"url":null,"abstract":"In standard clustering problems, data points are represented by vectors, and by stacking them together, one forms a data matrix with row or column cluster structure. In this paper, we consider a class of binary matrices, arising in many applications, which exhibit both row and column cluster structure, and our goal is to exactly recover the underlying row and column clusters by observing only a small fraction of noisy entries. We first derive a lower bound on the minimum number of observations needed for exact cluster recovery. Then, we study three algorithms with different running time and compare the number of observations needed by them for successful cluster recovery. Our analytical results show smooth time-data trade offs: one can gradually reduce the computational complexity when increasingly more observations are available.","PeriodicalId":306456,"journal":{"name":"Measurement and Modeling of Computer Systems","volume":"22 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"42","resultStr":"{\"title\":\"Jointly clustering rows and columns of binary matrices: algorithms and trade-offs\",\"authors\":\"Jiaming Xu, Rui Wu, Kai Zhu, B. Hajek, R. Srikant, Lei Ying\",\"doi\":\"10.1145/2591971.2592005\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In standard clustering problems, data points are represented by vectors, and by stacking them together, one forms a data matrix with row or column cluster structure. In this paper, we consider a class of binary matrices, arising in many applications, which exhibit both row and column cluster structure, and our goal is to exactly recover the underlying row and column clusters by observing only a small fraction of noisy entries. We first derive a lower bound on the minimum number of observations needed for exact cluster recovery. Then, we study three algorithms with different running time and compare the number of observations needed by them for successful cluster recovery. Our analytical results show smooth time-data trade offs: one can gradually reduce the computational complexity when increasingly more observations are available.\",\"PeriodicalId\":306456,\"journal\":{\"name\":\"Measurement and Modeling of Computer Systems\",\"volume\":\"22 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"42\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Measurement and Modeling of Computer Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2591971.2592005\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Measurement and Modeling of Computer Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2591971.2592005","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 42

摘要

在标准聚类问题中，数据点用向量表示，通过将它们堆叠在一起，形成具有行或列聚类结构的数据矩阵。在本文中，我们考虑了一类在许多应用中出现的二进制矩阵，它同时表现出行和列簇结构，我们的目标是通过观察一小部分有噪声的条目来精确地恢复底层的行和列簇。我们首先推导出精确集群恢复所需的最小观测数的下界。然后，我们研究了三种不同运行时间的算法，并比较了它们成功恢复集群所需的观测数。我们的分析结果显示了平滑的时间-数据权衡:当可用的观测值越来越多时，可以逐渐降低计算复杂性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Jointly clustering rows and columns of binary matrices: algorithms and trade-offs

In standard clustering problems, data points are represented by vectors, and by stacking them together, one forms a data matrix with row or column cluster structure. In this paper, we consider a class of binary matrices, arising in many applications, which exhibit both row and column cluster structure, and our goal is to exactly recover the underlying row and column clusters by observing only a small fraction of noisy entries. We first derive a lower bound on the minimum number of observations needed for exact cluster recovery. Then, we study three algorithms with different running time and compare the number of observations needed by them for successful cluster recovery. Our analytical results show smooth time-data trade offs: one can gradually reduce the computational complexity when increasingly more observations are available.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Measurement and Modeling of Computer Systems

自引率

0.00%

发文量