基于噪声Oracle的Top-m聚类

2019 National Conference on Communications (NCC) Pub Date : 2019-02-01 DOI:10.1109/NCC.2019.8732224

Tuhinangshu Choudhury, Dhruti Shah, N. Karamchandani

{"title":"基于噪声Oracle的Top-m聚类","authors":"Tuhinangshu Choudhury, Dhruti Shah, N. Karamchandani","doi":"10.1109/NCC.2019.8732224","DOIUrl":null,"url":null,"abstract":"In this paper, we analyse the problem of top-$m$ clustering with access to a noisy oracle. We consider a model where there are $n$ nodes, belonging to $k$ clusters. We have access to an oracle which when queried with a pair of nodes, returns a binary answer indicating whether they belong to the same cluster or not, but with a probability of error p. Our goal is to identify the top-m clusters in terms of size, using the noisy answers from the oracle. This setting was recently studied in [9], which provides an iterative algorithm for the case of complete clustering, i.e., $m=k$. We identify conditions (on the relative sizes of clusters) under which the first $m$ stages of the algorithm would recover the top $m$ clusters. We also analyze the query complexity of the algorithm and provide an upper bound which is a function of the number of recovered clusters $m$ and the sizes of the top clusters.","PeriodicalId":6870,"journal":{"name":"2019 National Conference on Communications (NCC)","volume":"128 1","pages":"1-6"},"PeriodicalIF":0.0000,"publicationDate":"2019-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Top-m Clustering with a Noisy Oracle\",\"authors\":\"Tuhinangshu Choudhury, Dhruti Shah, N. Karamchandani\",\"doi\":\"10.1109/NCC.2019.8732224\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, we analyse the problem of top-$m$ clustering with access to a noisy oracle. We consider a model where there are $n$ nodes, belonging to $k$ clusters. We have access to an oracle which when queried with a pair of nodes, returns a binary answer indicating whether they belong to the same cluster or not, but with a probability of error p. Our goal is to identify the top-m clusters in terms of size, using the noisy answers from the oracle. This setting was recently studied in [9], which provides an iterative algorithm for the case of complete clustering, i.e., $m=k$. We identify conditions (on the relative sizes of clusters) under which the first $m$ stages of the algorithm would recover the top $m$ clusters. We also analyze the query complexity of the algorithm and provide an upper bound which is a function of the number of recovered clusters $m$ and the sizes of the top clusters.\",\"PeriodicalId\":6870,\"journal\":{\"name\":\"2019 National Conference on Communications (NCC)\",\"volume\":\"128 1\",\"pages\":\"1-6\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-02-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 National Conference on Communications (NCC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/NCC.2019.8732224\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 National Conference on Communications (NCC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/NCC.2019.8732224","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

摘要

在本文中，我们分析了访问一个有噪声的oracle的top-$m$聚类问题。我们考虑一个模型，其中有$n$个节点，属于$k$个集群。我们可以访问一个oracle，当对一对节点进行查询时，返回一个二进制答案，表明它们是否属于同一个集群，但错误概率为p。我们的目标是使用oracle的噪声答案来识别大小方面的top-m集群。最近在[9]中对这种设置进行了研究，提供了完全聚类情况下的迭代算法，即$m=k$。我们确定了条件(关于集群的相对大小)，在这些条件下，算法的前$m$阶段将恢复前$m$集群。我们还分析了该算法的查询复杂度，并给出了一个上界，该上界是恢复簇的数量$m$和顶部簇的大小的函数。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Top-m Clustering with a Noisy Oracle

In this paper, we analyse the problem of top-$m$ clustering with access to a noisy oracle. We consider a model where there are $n$ nodes, belonging to $k$ clusters. We have access to an oracle which when queried with a pair of nodes, returns a binary answer indicating whether they belong to the same cluster or not, but with a probability of error p. Our goal is to identify the top-m clusters in terms of size, using the noisy answers from the oracle. This setting was recently studied in [9], which provides an iterative algorithm for the case of complete clustering, i.e., $m=k$. We identify conditions (on the relative sizes of clusters) under which the first $m$ stages of the algorithm would recover the top $m$ clusters. We also analyze the query complexity of the algorithm and provide an upper bound which is a function of the number of recovered clusters $m$ and the sizes of the top clusters.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2019 National Conference on Communications (NCC)

自引率

0.00%

发文量