Tuhinangshu Choudhury, Dhruti Shah, N. Karamchandani
{"title":"基于噪声Oracle的Top-m聚类","authors":"Tuhinangshu Choudhury, Dhruti Shah, N. Karamchandani","doi":"10.1109/NCC.2019.8732224","DOIUrl":null,"url":null,"abstract":"In this paper, we analyse the problem of top-$m$ clustering with access to a noisy oracle. We consider a model where there are $n$ nodes, belonging to $k$ clusters. We have access to an oracle which when queried with a pair of nodes, returns a binary answer indicating whether they belong to the same cluster or not, but with a probability of error p. Our goal is to identify the top-m clusters in terms of size, using the noisy answers from the oracle. This setting was recently studied in [9], which provides an iterative algorithm for the case of complete clustering, i.e., $m=k$. We identify conditions (on the relative sizes of clusters) under which the first $m$ stages of the algorithm would recover the top $m$ clusters. We also analyze the query complexity of the algorithm and provide an upper bound which is a function of the number of recovered clusters $m$ and the sizes of the top clusters.","PeriodicalId":6870,"journal":{"name":"2019 National Conference on Communications (NCC)","volume":"128 1","pages":"1-6"},"PeriodicalIF":0.0000,"publicationDate":"2019-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Top-m Clustering with a Noisy Oracle\",\"authors\":\"Tuhinangshu Choudhury, Dhruti Shah, N. Karamchandani\",\"doi\":\"10.1109/NCC.2019.8732224\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, we analyse the problem of top-$m$ clustering with access to a noisy oracle. We consider a model where there are $n$ nodes, belonging to $k$ clusters. We have access to an oracle which when queried with a pair of nodes, returns a binary answer indicating whether they belong to the same cluster or not, but with a probability of error p. Our goal is to identify the top-m clusters in terms of size, using the noisy answers from the oracle. This setting was recently studied in [9], which provides an iterative algorithm for the case of complete clustering, i.e., $m=k$. We identify conditions (on the relative sizes of clusters) under which the first $m$ stages of the algorithm would recover the top $m$ clusters. We also analyze the query complexity of the algorithm and provide an upper bound which is a function of the number of recovered clusters $m$ and the sizes of the top clusters.\",\"PeriodicalId\":6870,\"journal\":{\"name\":\"2019 National Conference on Communications (NCC)\",\"volume\":\"128 1\",\"pages\":\"1-6\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-02-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 National Conference on Communications (NCC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/NCC.2019.8732224\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 National Conference on Communications (NCC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/NCC.2019.8732224","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
In this paper, we analyse the problem of top-$m$ clustering with access to a noisy oracle. We consider a model where there are $n$ nodes, belonging to $k$ clusters. We have access to an oracle which when queried with a pair of nodes, returns a binary answer indicating whether they belong to the same cluster or not, but with a probability of error p. Our goal is to identify the top-m clusters in terms of size, using the noisy answers from the oracle. This setting was recently studied in [9], which provides an iterative algorithm for the case of complete clustering, i.e., $m=k$. We identify conditions (on the relative sizes of clusters) under which the first $m$ stages of the algorithm would recover the top $m$ clusters. We also analyze the query complexity of the algorithm and provide an upper bound which is a function of the number of recovered clusters $m$ and the sizes of the top clusters.