Tuhinangshu Choudhury, Dhruti Shah, N. Karamchandani
{"title":"Top-m Clustering with a Noisy Oracle","authors":"Tuhinangshu Choudhury, Dhruti Shah, N. Karamchandani","doi":"10.1109/NCC.2019.8732224","DOIUrl":null,"url":null,"abstract":"In this paper, we analyse the problem of top-$m$ clustering with access to a noisy oracle. We consider a model where there are $n$ nodes, belonging to $k$ clusters. We have access to an oracle which when queried with a pair of nodes, returns a binary answer indicating whether they belong to the same cluster or not, but with a probability of error p. Our goal is to identify the top-m clusters in terms of size, using the noisy answers from the oracle. This setting was recently studied in [9], which provides an iterative algorithm for the case of complete clustering, i.e., $m=k$. We identify conditions (on the relative sizes of clusters) under which the first $m$ stages of the algorithm would recover the top $m$ clusters. We also analyze the query complexity of the algorithm and provide an upper bound which is a function of the number of recovered clusters $m$ and the sizes of the top clusters.","PeriodicalId":6870,"journal":{"name":"2019 National Conference on Communications (NCC)","volume":"128 1","pages":"1-6"},"PeriodicalIF":0.0000,"publicationDate":"2019-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 National Conference on Communications (NCC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/NCC.2019.8732224","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4
Abstract
In this paper, we analyse the problem of top-$m$ clustering with access to a noisy oracle. We consider a model where there are $n$ nodes, belonging to $k$ clusters. We have access to an oracle which when queried with a pair of nodes, returns a binary answer indicating whether they belong to the same cluster or not, but with a probability of error p. Our goal is to identify the top-m clusters in terms of size, using the noisy answers from the oracle. This setting was recently studied in [9], which provides an iterative algorithm for the case of complete clustering, i.e., $m=k$. We identify conditions (on the relative sizes of clusters) under which the first $m$ stages of the algorithm would recover the top $m$ clusters. We also analyze the query complexity of the algorithm and provide an upper bound which is a function of the number of recovered clusters $m$ and the sizes of the top clusters.