使用非适应性子集查询进行聚类

Hadley Black, Euiwoong Lee, Arya Mazumdar, Barna Saha
{"title":"使用非适应性子集查询进行聚类","authors":"Hadley Black, Euiwoong Lee, Arya Mazumdar, Barna Saha","doi":"arxiv-2409.10908","DOIUrl":null,"url":null,"abstract":"Recovering the underlying clustering of a set $U$ of $n$ points by asking\npair-wise same-cluster queries has garnered significant interest in the last\ndecade. Given a query $S \\subset U$, $|S|=2$, the oracle returns yes if the\npoints are in the same cluster and no otherwise. For adaptive algorithms with\npair-wise queries, the number of required queries is known to be $\\Theta(nk)$,\nwhere $k$ is the number of clusters. However, non-adaptive schemes require\n$\\Omega(n^2)$ queries, which matches the trivial $O(n^2)$ upper bound attained\nby querying every pair of points. To break the quadratic barrier for non-adaptive queries, we study a\ngeneralization of this problem to subset queries for $|S|>2$, where the oracle\nreturns the number of clusters intersecting $S$. Allowing for subset queries of\nunbounded size, $O(n)$ queries is possible with an adaptive scheme\n(Chakrabarty-Liao, 2024). However, the realm of non-adaptive algorithms is\ncompletely unknown. In this paper, we give the first non-adaptive algorithms for clustering with\nsubset queries. Our main result is a non-adaptive algorithm making $O(n \\log k\n\\cdot (\\log k + \\log\\log n)^2)$ queries, which improves to $O(n \\log \\log n)$\nwhen $k$ is a constant. We also consider algorithms with a restricted query\nsize of at most $s$. In this setting we prove that $\\Omega(\\max(n^2/s^2,n))$\nqueries are necessary and obtain algorithms making $\\tilde{O}(n^2k/s^2)$\nqueries for any $s \\leq \\sqrt{n}$ and $\\tilde{O}(n^2/s)$ queries for any $s\n\\leq n$. We also consider the natural special case when the clusters are\nbalanced, obtaining non-adaptive algorithms which make $O(n \\log k) +\n\\tilde{O}(k)$ and $O(n\\log^2 k)$ queries. Finally, allowing two rounds of\nadaptivity, we give an algorithm making $O(n \\log k)$ queries in the general\ncase and $O(n \\log \\log k)$ queries when the clusters are balanced.","PeriodicalId":501525,"journal":{"name":"arXiv - CS - Data Structures and Algorithms","volume":"27 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Clustering with Non-adaptive Subset Queries\",\"authors\":\"Hadley Black, Euiwoong Lee, Arya Mazumdar, Barna Saha\",\"doi\":\"arxiv-2409.10908\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Recovering the underlying clustering of a set $U$ of $n$ points by asking\\npair-wise same-cluster queries has garnered significant interest in the last\\ndecade. Given a query $S \\\\subset U$, $|S|=2$, the oracle returns yes if the\\npoints are in the same cluster and no otherwise. For adaptive algorithms with\\npair-wise queries, the number of required queries is known to be $\\\\Theta(nk)$,\\nwhere $k$ is the number of clusters. However, non-adaptive schemes require\\n$\\\\Omega(n^2)$ queries, which matches the trivial $O(n^2)$ upper bound attained\\nby querying every pair of points. To break the quadratic barrier for non-adaptive queries, we study a\\ngeneralization of this problem to subset queries for $|S|>2$, where the oracle\\nreturns the number of clusters intersecting $S$. Allowing for subset queries of\\nunbounded size, $O(n)$ queries is possible with an adaptive scheme\\n(Chakrabarty-Liao, 2024). However, the realm of non-adaptive algorithms is\\ncompletely unknown. In this paper, we give the first non-adaptive algorithms for clustering with\\nsubset queries. Our main result is a non-adaptive algorithm making $O(n \\\\log k\\n\\\\cdot (\\\\log k + \\\\log\\\\log n)^2)$ queries, which improves to $O(n \\\\log \\\\log n)$\\nwhen $k$ is a constant. We also consider algorithms with a restricted query\\nsize of at most $s$. In this setting we prove that $\\\\Omega(\\\\max(n^2/s^2,n))$\\nqueries are necessary and obtain algorithms making $\\\\tilde{O}(n^2k/s^2)$\\nqueries for any $s \\\\leq \\\\sqrt{n}$ and $\\\\tilde{O}(n^2/s)$ queries for any $s\\n\\\\leq n$. We also consider the natural special case when the clusters are\\nbalanced, obtaining non-adaptive algorithms which make $O(n \\\\log k) +\\n\\\\tilde{O}(k)$ and $O(n\\\\log^2 k)$ queries. Finally, allowing two rounds of\\nadaptivity, we give an algorithm making $O(n \\\\log k)$ queries in the general\\ncase and $O(n \\\\log \\\\log k)$ queries when the clusters are balanced.\",\"PeriodicalId\":501525,\"journal\":{\"name\":\"arXiv - CS - Data Structures and Algorithms\",\"volume\":\"27 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Data Structures and Algorithms\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.10908\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Data Structures and Algorithms","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.10908","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

在过去的十年中,通过提出成对的同簇查询来恢复由 $n$ 点组成的集合 $U$ 的基本聚类问题引起了人们的极大兴趣。给定查询 $S \subset U$,$|S|=2$,如果点在同一聚类中,则神谕返回 "是",否则返回 "否"。对于采用成对查询的自适应算法,已知所需的查询次数为 $\theta(nk)$,其中 $k$ 是簇的数量。然而,非自适应方案需要 $Omega(n^2)$ 查询,这与通过查询每一对点而达到的微不足道的 $O(n^2)$ 上限相匹配。为了打破非自适应性查询的二次方障碍,我们研究了将这一问题推广到 $|S|>2$ 的子集查询,在子集查询中,查询器会返回与 $S$ 相交的簇的数目。在允许子集查询大小无界的情况下,用自适应方案可以实现 $O(n)$ 查询(Chakrabarty-Liao,2024 年)。然而,非适应性算法的领域还完全未知。在本文中,我们首次给出了使用子集查询进行聚类的非自适应算法。我们的主要成果是一种非自适应算法,可以实现 $O(n \log k\cdot (\log k + \log\log n)^2)$ 查询,当 $k$ 是常数时,该算法可以提高到 $O(n \log \log n)$。我们还考虑了限制查询大小最多为 $s$ 的算法。在这种情况下,我们证明了$\Omega(\max(n^2/s^2,n))$查询是必要的,并得到了对任意$s \leq \sqrt{n}$和任意$s\leq n$进行$\tilde{O}(n^2k/s^2)$查询的算法。我们还考虑了簇平衡时的自然特例,得到了非适应性算法,其查询次数为 $O(n \log k) +\tilde{O}(k)$ 和 $O(n\log^2 k)$。最后,在允许两轮自适应的情况下,我们给出了在一般情况下进行 $O(n \log k)$ 查询的算法,以及在簇平衡时进行 $O(n \log \log k)$ 查询的算法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Clustering with Non-adaptive Subset Queries
Recovering the underlying clustering of a set $U$ of $n$ points by asking pair-wise same-cluster queries has garnered significant interest in the last decade. Given a query $S \subset U$, $|S|=2$, the oracle returns yes if the points are in the same cluster and no otherwise. For adaptive algorithms with pair-wise queries, the number of required queries is known to be $\Theta(nk)$, where $k$ is the number of clusters. However, non-adaptive schemes require $\Omega(n^2)$ queries, which matches the trivial $O(n^2)$ upper bound attained by querying every pair of points. To break the quadratic barrier for non-adaptive queries, we study a generalization of this problem to subset queries for $|S|>2$, where the oracle returns the number of clusters intersecting $S$. Allowing for subset queries of unbounded size, $O(n)$ queries is possible with an adaptive scheme (Chakrabarty-Liao, 2024). However, the realm of non-adaptive algorithms is completely unknown. In this paper, we give the first non-adaptive algorithms for clustering with subset queries. Our main result is a non-adaptive algorithm making $O(n \log k \cdot (\log k + \log\log n)^2)$ queries, which improves to $O(n \log \log n)$ when $k$ is a constant. We also consider algorithms with a restricted query size of at most $s$. In this setting we prove that $\Omega(\max(n^2/s^2,n))$ queries are necessary and obtain algorithms making $\tilde{O}(n^2k/s^2)$ queries for any $s \leq \sqrt{n}$ and $\tilde{O}(n^2/s)$ queries for any $s \leq n$. We also consider the natural special case when the clusters are balanced, obtaining non-adaptive algorithms which make $O(n \log k) + \tilde{O}(k)$ and $O(n\log^2 k)$ queries. Finally, allowing two rounds of adaptivity, we give an algorithm making $O(n \log k)$ queries in the general case and $O(n \log \log k)$ queries when the clusters are balanced.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信