二值矩阵的行和列联合聚类:算法和权衡

Jiaming Xu, Rui Wu, Kai Zhu, B. Hajek, R. Srikant, Lei Ying
{"title":"二值矩阵的行和列联合聚类:算法和权衡","authors":"Jiaming Xu, Rui Wu, Kai Zhu, B. Hajek, R. Srikant, Lei Ying","doi":"10.1145/2591971.2592005","DOIUrl":null,"url":null,"abstract":"In standard clustering problems, data points are represented by vectors, and by stacking them together, one forms a data matrix with row or column cluster structure. In this paper, we consider a class of binary matrices, arising in many applications, which exhibit both row and column cluster structure, and our goal is to exactly recover the underlying row and column clusters by observing only a small fraction of noisy entries. We first derive a lower bound on the minimum number of observations needed for exact cluster recovery. Then, we study three algorithms with different running time and compare the number of observations needed by them for successful cluster recovery. Our analytical results show smooth time-data trade offs: one can gradually reduce the computational complexity when increasingly more observations are available.","PeriodicalId":306456,"journal":{"name":"Measurement and Modeling of Computer Systems","volume":"22 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"42","resultStr":"{\"title\":\"Jointly clustering rows and columns of binary matrices: algorithms and trade-offs\",\"authors\":\"Jiaming Xu, Rui Wu, Kai Zhu, B. Hajek, R. Srikant, Lei Ying\",\"doi\":\"10.1145/2591971.2592005\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In standard clustering problems, data points are represented by vectors, and by stacking them together, one forms a data matrix with row or column cluster structure. In this paper, we consider a class of binary matrices, arising in many applications, which exhibit both row and column cluster structure, and our goal is to exactly recover the underlying row and column clusters by observing only a small fraction of noisy entries. We first derive a lower bound on the minimum number of observations needed for exact cluster recovery. Then, we study three algorithms with different running time and compare the number of observations needed by them for successful cluster recovery. Our analytical results show smooth time-data trade offs: one can gradually reduce the computational complexity when increasingly more observations are available.\",\"PeriodicalId\":306456,\"journal\":{\"name\":\"Measurement and Modeling of Computer Systems\",\"volume\":\"22 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"42\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Measurement and Modeling of Computer Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2591971.2592005\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Measurement and Modeling of Computer Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2591971.2592005","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 42

摘要

在标准聚类问题中,数据点用向量表示,通过将它们堆叠在一起,形成具有行或列聚类结构的数据矩阵。在本文中,我们考虑了一类在许多应用中出现的二进制矩阵,它同时表现出行和列簇结构,我们的目标是通过观察一小部分有噪声的条目来精确地恢复底层的行和列簇。我们首先推导出精确集群恢复所需的最小观测数的下界。然后,我们研究了三种不同运行时间的算法,并比较了它们成功恢复集群所需的观测数。我们的分析结果显示了平滑的时间-数据权衡:当可用的观测值越来越多时,可以逐渐降低计算复杂性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Jointly clustering rows and columns of binary matrices: algorithms and trade-offs
In standard clustering problems, data points are represented by vectors, and by stacking them together, one forms a data matrix with row or column cluster structure. In this paper, we consider a class of binary matrices, arising in many applications, which exhibit both row and column cluster structure, and our goal is to exactly recover the underlying row and column clusters by observing only a small fraction of noisy entries. We first derive a lower bound on the minimum number of observations needed for exact cluster recovery. Then, we study three algorithms with different running time and compare the number of observations needed by them for successful cluster recovery. Our analytical results show smooth time-data trade offs: one can gradually reduce the computational complexity when increasingly more observations are available.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信