Hadley Black, Euiwoong Lee, Arya Mazumdar, Barna Saha
{"title":"Clustering with Non-adaptive Subset Queries","authors":"Hadley Black, Euiwoong Lee, Arya Mazumdar, Barna Saha","doi":"arxiv-2409.10908","DOIUrl":null,"url":null,"abstract":"Recovering the underlying clustering of a set $U$ of $n$ points by asking\npair-wise same-cluster queries has garnered significant interest in the last\ndecade. Given a query $S \\subset U$, $|S|=2$, the oracle returns yes if the\npoints are in the same cluster and no otherwise. For adaptive algorithms with\npair-wise queries, the number of required queries is known to be $\\Theta(nk)$,\nwhere $k$ is the number of clusters. However, non-adaptive schemes require\n$\\Omega(n^2)$ queries, which matches the trivial $O(n^2)$ upper bound attained\nby querying every pair of points. To break the quadratic barrier for non-adaptive queries, we study a\ngeneralization of this problem to subset queries for $|S|>2$, where the oracle\nreturns the number of clusters intersecting $S$. Allowing for subset queries of\nunbounded size, $O(n)$ queries is possible with an adaptive scheme\n(Chakrabarty-Liao, 2024). However, the realm of non-adaptive algorithms is\ncompletely unknown. In this paper, we give the first non-adaptive algorithms for clustering with\nsubset queries. Our main result is a non-adaptive algorithm making $O(n \\log k\n\\cdot (\\log k + \\log\\log n)^2)$ queries, which improves to $O(n \\log \\log n)$\nwhen $k$ is a constant. We also consider algorithms with a restricted query\nsize of at most $s$. In this setting we prove that $\\Omega(\\max(n^2/s^2,n))$\nqueries are necessary and obtain algorithms making $\\tilde{O}(n^2k/s^2)$\nqueries for any $s \\leq \\sqrt{n}$ and $\\tilde{O}(n^2/s)$ queries for any $s\n\\leq n$. We also consider the natural special case when the clusters are\nbalanced, obtaining non-adaptive algorithms which make $O(n \\log k) +\n\\tilde{O}(k)$ and $O(n\\log^2 k)$ queries. Finally, allowing two rounds of\nadaptivity, we give an algorithm making $O(n \\log k)$ queries in the general\ncase and $O(n \\log \\log k)$ queries when the clusters are balanced.","PeriodicalId":501525,"journal":{"name":"arXiv - CS - Data Structures and Algorithms","volume":"27 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Data Structures and Algorithms","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.10908","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Recovering the underlying clustering of a set $U$ of $n$ points by asking
pair-wise same-cluster queries has garnered significant interest in the last
decade. Given a query $S \subset U$, $|S|=2$, the oracle returns yes if the
points are in the same cluster and no otherwise. For adaptive algorithms with
pair-wise queries, the number of required queries is known to be $\Theta(nk)$,
where $k$ is the number of clusters. However, non-adaptive schemes require
$\Omega(n^2)$ queries, which matches the trivial $O(n^2)$ upper bound attained
by querying every pair of points. To break the quadratic barrier for non-adaptive queries, we study a
generalization of this problem to subset queries for $|S|>2$, where the oracle
returns the number of clusters intersecting $S$. Allowing for subset queries of
unbounded size, $O(n)$ queries is possible with an adaptive scheme
(Chakrabarty-Liao, 2024). However, the realm of non-adaptive algorithms is
completely unknown. In this paper, we give the first non-adaptive algorithms for clustering with
subset queries. Our main result is a non-adaptive algorithm making $O(n \log k
\cdot (\log k + \log\log n)^2)$ queries, which improves to $O(n \log \log n)$
when $k$ is a constant. We also consider algorithms with a restricted query
size of at most $s$. In this setting we prove that $\Omega(\max(n^2/s^2,n))$
queries are necessary and obtain algorithms making $\tilde{O}(n^2k/s^2)$
queries for any $s \leq \sqrt{n}$ and $\tilde{O}(n^2/s)$ queries for any $s
\leq n$. We also consider the natural special case when the clusters are
balanced, obtaining non-adaptive algorithms which make $O(n \log k) +
\tilde{O}(k)$ and $O(n\log^2 k)$ queries. Finally, allowing two rounds of
adaptivity, we give an algorithm making $O(n \log k)$ queries in the general
case and $O(n \log \log k)$ queries when the clusters are balanced.