{"title":"Can Distributed Uniformity Testing Be Local?","authors":"Uri Meir, Dor Minzer, R. Oshman","doi":"10.1145/3293611.3331613","DOIUrl":null,"url":null,"abstract":"In the distributed uniformity testing problem, k servers draw samples from some unknown distribution, and the goal is to determine whether the unknown distribution is uniform or whether it is ε-far from uniform, where ε is a proximity parameter. Each server decides whether to accept or reject, and these decisions are sent to a referee, who makes a final decision based on the servers' local decisions. Uniformity testing is a particularly useful building-block, because it is complete for the problem of testing identity to any fixed distribution. It was recently shown that distributing the task of uniformity testing allows each server to draw fewer samples than are needed in the centralized case, but so far the number of samples required for distributed uniformity testing has not been well understood. In this paper we settle this question, and also investigate the cost of using local decision rules, such as rejecting iff at least one server wants to reject (the usual decision rule used in local distributed decision). To answer these questions, we develop a new Fourier-based technique for proving lower bounds on the sample complexity of distribution testing, which lends itself particularly well to the distributed case. Using our technique, we tightly characterize the number of samples required for uniformity testing when the referee can apply any decision function to the servers' local decisions. We also show that if the network rejects whenever one server wants to reject, then the cost of uniformity testing is much higher, and in fact we do not gain compared to the centralized case unless the number of servers is exponential in Ω (1/ε). Finally, we apply our lower bound technique to the case where the referee applies a threshold decision rule, and also generalize a lower bound from[1] for learning an unknown input distribution.","PeriodicalId":153766,"journal":{"name":"Proceedings of the 2019 ACM Symposium on Principles of Distributed Computing","volume":"238 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2019 ACM Symposium on Principles of Distributed Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3293611.3331613","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
In the distributed uniformity testing problem, k servers draw samples from some unknown distribution, and the goal is to determine whether the unknown distribution is uniform or whether it is ε-far from uniform, where ε is a proximity parameter. Each server decides whether to accept or reject, and these decisions are sent to a referee, who makes a final decision based on the servers' local decisions. Uniformity testing is a particularly useful building-block, because it is complete for the problem of testing identity to any fixed distribution. It was recently shown that distributing the task of uniformity testing allows each server to draw fewer samples than are needed in the centralized case, but so far the number of samples required for distributed uniformity testing has not been well understood. In this paper we settle this question, and also investigate the cost of using local decision rules, such as rejecting iff at least one server wants to reject (the usual decision rule used in local distributed decision). To answer these questions, we develop a new Fourier-based technique for proving lower bounds on the sample complexity of distribution testing, which lends itself particularly well to the distributed case. Using our technique, we tightly characterize the number of samples required for uniformity testing when the referee can apply any decision function to the servers' local decisions. We also show that if the network rejects whenever one server wants to reject, then the cost of uniformity testing is much higher, and in fact we do not gain compared to the centralized case unless the number of servers is exponential in Ω (1/ε). Finally, we apply our lower bound technique to the case where the referee applies a threshold decision rule, and also generalize a lower bound from[1] for learning an unknown input distribution.