Top-m Clustering with a Noisy Oracle

2019 National Conference on Communications (NCC) Pub Date : 2019-02-01 DOI:10.1109/NCC.2019.8732224

Tuhinangshu Choudhury, Dhruti Shah, N. Karamchandani

引用次数: 4

Abstract

In this paper, we analyse the problem of top-$m$ clustering with access to a noisy oracle. We consider a model where there are $n$ nodes, belonging to $k$ clusters. We have access to an oracle which when queried with a pair of nodes, returns a binary answer indicating whether they belong to the same cluster or not, but with a probability of error p. Our goal is to identify the top-m clusters in terms of size, using the noisy answers from the oracle. This setting was recently studied in [9], which provides an iterative algorithm for the case of complete clustering, i.e., $m=k$. We identify conditions (on the relative sizes of clusters) under which the first $m$ stages of the algorithm would recover the top $m$ clusters. We also analyze the query complexity of the algorithm and provide an upper bound which is a function of the number of recovered clusters $m$ and the sizes of the top clusters.

查看原文本刊更多论文

基于噪声Oracle的Top-m聚类

在本文中，我们分析了访问一个有噪声的oracle的top-$m$聚类问题。我们考虑一个模型，其中有$n$个节点，属于$k$个集群。我们可以访问一个oracle，当对一对节点进行查询时，返回一个二进制答案，表明它们是否属于同一个集群，但错误概率为p。我们的目标是使用oracle的噪声答案来识别大小方面的top-m集群。最近在[9]中对这种设置进行了研究，提供了完全聚类情况下的迭代算法，即$m=k$。我们确定了条件(关于集群的相对大小)，在这些条件下，算法的前$m$阶段将恢复前$m$集群。我们还分析了该算法的查询复杂度，并给出了一个上界，该上界是恢复簇的数量$m$和顶部簇的大小的函数。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2019 National Conference on Communications (NCC)

自引率

0.00%

发文量