Jordan Awan, Adam Edwards, Paul Bartholomew, Andrew Sillers
{"title":"Best Linear Unbiased Estimate from Privatized Histograms","authors":"Jordan Awan, Adam Edwards, Paul Bartholomew, Andrew Sillers","doi":"arxiv-2409.04387","DOIUrl":null,"url":null,"abstract":"In differential privacy (DP) mechanisms, it can be beneficial to release\n\"redundant\" outputs, in the sense that a quantity can be estimated by combining\ndifferent combinations of privatized values. Indeed, this structure is present\nin the DP 2020 Decennial Census products published by the U.S. Census Bureau.\nWith this structure, the DP output can be improved by enforcing\nself-consistency (i.e., estimators obtained by combining different values\nresult in the same estimate) and we show that the minimum variance processing\nis a linear projection. However, standard projection algorithms are too\ncomputationally expensive in terms of both memory and execution time for\napplications such as the Decennial Census. We propose the Scalable Efficient\nAlgorithm for Best Linear Unbiased Estimate (SEA BLUE), based on a two step\nprocess of aggregation and differencing that 1) enforces self-consistency\nthrough a linear and unbiased procedure, 2) is computationally and memory\nefficient, 3) achieves the minimum variance solution under certain structural\nassumptions, and 4) is empirically shown to be robust to violations of these\nstructural assumptions. We propose three methods of calculating confidence\nintervals from our estimates, under various assumptions. We apply SEA BLUE to\ntwo 2010 Census demonstration products, illustrating its scalability and\nvalidity.","PeriodicalId":501215,"journal":{"name":"arXiv - STAT - Computation","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - STAT - Computation","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.04387","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
In differential privacy (DP) mechanisms, it can be beneficial to release
"redundant" outputs, in the sense that a quantity can be estimated by combining
different combinations of privatized values. Indeed, this structure is present
in the DP 2020 Decennial Census products published by the U.S. Census Bureau.
With this structure, the DP output can be improved by enforcing
self-consistency (i.e., estimators obtained by combining different values
result in the same estimate) and we show that the minimum variance processing
is a linear projection. However, standard projection algorithms are too
computationally expensive in terms of both memory and execution time for
applications such as the Decennial Census. We propose the Scalable Efficient
Algorithm for Best Linear Unbiased Estimate (SEA BLUE), based on a two step
process of aggregation and differencing that 1) enforces self-consistency
through a linear and unbiased procedure, 2) is computationally and memory
efficient, 3) achieves the minimum variance solution under certain structural
assumptions, and 4) is empirically shown to be robust to violations of these
structural assumptions. We propose three methods of calculating confidence
intervals from our estimates, under various assumptions. We apply SEA BLUE to
two 2010 Census demonstration products, illustrating its scalability and
validity.