Terrance Liu, Jingwu Tang, Giuseppe Vietri, Zhiwei Steven Wu
{"title":"Generating Private Synthetic Data with Genetic Algorithms","authors":"Terrance Liu, Jingwu Tang, Giuseppe Vietri, Zhiwei Steven Wu","doi":"arxiv-2306.03257","DOIUrl":null,"url":null,"abstract":"We study the problem of efficiently generating differentially private\nsynthetic data that approximate the statistical properties of an underlying\nsensitive dataset. In recent years, there has been a growing line of work that\napproaches this problem using first-order optimization techniques. However,\nsuch techniques are restricted to optimizing differentiable objectives only,\nseverely limiting the types of analyses that can be conducted. For example,\nfirst-order mechanisms have been primarily successful in approximating\nstatistical queries only in the form of marginals for discrete data domains. In\nsome cases, one can circumvent such issues by relaxing the task's objective to\nmaintain differentiability. However, even when possible, these approaches\nimpose a fundamental limitation in which modifications to the minimization\nproblem become additional sources of error. Therefore, we propose Private-GSD,\na private genetic algorithm based on zeroth-order optimization heuristics that\ndo not require modifying the original objective. As a result, it avoids the\naforementioned limitations of first-order optimization. We empirically evaluate\nPrivate-GSD against baseline algorithms on data derived from the American\nCommunity Survey across a variety of statistics--otherwise known as statistical\nqueries--both for discrete and real-valued attributes. We show that Private-GSD\noutperforms the state-of-the-art methods on non-differential queries while\nmatching accuracy in approximating differentiable ones.","PeriodicalId":501310,"journal":{"name":"arXiv - CS - Other Computer Science","volume":"238 4","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Other Computer Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2306.03257","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
We study the problem of efficiently generating differentially private
synthetic data that approximate the statistical properties of an underlying
sensitive dataset. In recent years, there has been a growing line of work that
approaches this problem using first-order optimization techniques. However,
such techniques are restricted to optimizing differentiable objectives only,
severely limiting the types of analyses that can be conducted. For example,
first-order mechanisms have been primarily successful in approximating
statistical queries only in the form of marginals for discrete data domains. In
some cases, one can circumvent such issues by relaxing the task's objective to
maintain differentiability. However, even when possible, these approaches
impose a fundamental limitation in which modifications to the minimization
problem become additional sources of error. Therefore, we propose Private-GSD,
a private genetic algorithm based on zeroth-order optimization heuristics that
do not require modifying the original objective. As a result, it avoids the
aforementioned limitations of first-order optimization. We empirically evaluate
Private-GSD against baseline algorithms on data derived from the American
Community Survey across a variety of statistics--otherwise known as statistical
queries--both for discrete and real-valued attributes. We show that Private-GSD
outperforms the state-of-the-art methods on non-differential queries while
matching accuracy in approximating differentiable ones.