Local, Private, Efficient Protocols for Succinct Histograms

Proceedings of the forty-seventh annual ACM symposium on Theory of Computing Pub Date : 2015-04-17 DOI:10.1145/2746539.2746632

Raef Bassily, Adam D. Smith

{"title":"Local, Private, Efficient Protocols for Succinct Histograms","authors":"Raef Bassily, Adam D. Smith","doi":"10.1145/2746539.2746632","DOIUrl":null,"url":null,"abstract":"We give efficient protocols and matching accuracy lower bounds for frequency estimation in the local model for differential privacy. In this model, individual users randomize their data themselves, sending differentially private reports to an untrusted server that aggregates them. We study protocols that produce a succinct histogram representation of the data. A succinct histogram is a list of the most frequent items in the data (often called \"heavy hitters\") along with estimates of their frequencies; the frequency of all other items is implicitly estimated as 0. If there are n users whose items come from a universe of size d, our protocols run in time polynomial in n and log(d). With high probability, they estimate the accuracy of every item up to error O(√{log(d)/(ε2n)}). Moreover, we show that this much error is necessary, regardless of computational efficiency, and even for the simple setting where only one item appears with significant frequency in the data set. Previous protocols (Mishra and Sandler, 2006; Hsu, Khanna and Roth, 2012) for this task either ran in time Ω(d) or had much worse error (about √[6]{log(d)/(ε2n)}), and the only known lower bound on error was Ω(1/√{n}). We also adapt a result of McGregor et al (2010) to the local setting. In a model with public coins, we show that each user need only send 1 bit to the server. For all known local protocols (including ours), the transformation preserves computational efficiency.","PeriodicalId":20566,"journal":{"name":"Proceedings of the forty-seventh annual ACM symposium on Theory of Computing","volume":"15 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2015-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"373","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the forty-seventh annual ACM symposium on Theory of Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2746539.2746632","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 373

Abstract

We give efficient protocols and matching accuracy lower bounds for frequency estimation in the local model for differential privacy. In this model, individual users randomize their data themselves, sending differentially private reports to an untrusted server that aggregates them. We study protocols that produce a succinct histogram representation of the data. A succinct histogram is a list of the most frequent items in the data (often called "heavy hitters") along with estimates of their frequencies; the frequency of all other items is implicitly estimated as 0. If there are n users whose items come from a universe of size d, our protocols run in time polynomial in n and log(d). With high probability, they estimate the accuracy of every item up to error O(√{log(d)/(ε2n)}). Moreover, we show that this much error is necessary, regardless of computational efficiency, and even for the simple setting where only one item appears with significant frequency in the data set. Previous protocols (Mishra and Sandler, 2006; Hsu, Khanna and Roth, 2012) for this task either ran in time Ω(d) or had much worse error (about √[6]{log(d)/(ε2n)}), and the only known lower bound on error was Ω(1/√{n}). We also adapt a result of McGregor et al (2010) to the local setting. In a model with public coins, we show that each user need only send 1 bit to the server. For all known local protocols (including ours), the transformation preserves computational efficiency.

查看原文本刊更多论文

简洁直方图的本地、私有、高效协议

给出了差分隐私局部模型频率估计的有效协议和匹配精度下界。在这个模型中，单个用户自己随机化他们的数据，将不同的私有报告发送到聚合它们的不受信任的服务器。我们研究的协议产生一个简洁的直方图表示的数据。简洁的直方图是数据中最频繁项目的列表(通常称为“heavy hitters”)以及它们的频率估计;所有其他项目的频率隐式估计为0。如果有n个用户的物品来自一个大小为d的域，那么我们的协议在时间多项式(n和log(d))中运行。在高概率下，他们估计每个项目的准确性达到误差O(√{log(d)/(ε2n)})。此外，我们表明，无论计算效率如何，甚至对于数据集中只有一个项目出现频率显著的简单设置，如此多的误差是必要的。以前的协议(Mishra和Sandler, 2006;Hsu, Khanna和Roth, 2012)对于这个任务，要么及时运行Ω(d)，要么误差更大(约√[6]{log(d)/(ε2n)})，唯一已知的误差下界是Ω(1/√{n})。我们还将McGregor等人(2010)的结果适应当地环境。在使用公共币的模型中，我们显示每个用户只需要向服务器发送1比特。对于所有已知的本地协议(包括我们的)，转换保持了计算效率。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the forty-seventh annual ACM symposium on Theory of Computing

自引率

0.00%

发文量