Generating Private Synthetic Data with Genetic Algorithms

Terrance Liu, Jingwu Tang, Giuseppe Vietri, Zhiwei Steven Wu
{"title":"Generating Private Synthetic Data with Genetic Algorithms","authors":"Terrance Liu, Jingwu Tang, Giuseppe Vietri, Zhiwei Steven Wu","doi":"arxiv-2306.03257","DOIUrl":null,"url":null,"abstract":"We study the problem of efficiently generating differentially private\nsynthetic data that approximate the statistical properties of an underlying\nsensitive dataset. In recent years, there has been a growing line of work that\napproaches this problem using first-order optimization techniques. However,\nsuch techniques are restricted to optimizing differentiable objectives only,\nseverely limiting the types of analyses that can be conducted. For example,\nfirst-order mechanisms have been primarily successful in approximating\nstatistical queries only in the form of marginals for discrete data domains. In\nsome cases, one can circumvent such issues by relaxing the task's objective to\nmaintain differentiability. However, even when possible, these approaches\nimpose a fundamental limitation in which modifications to the minimization\nproblem become additional sources of error. Therefore, we propose Private-GSD,\na private genetic algorithm based on zeroth-order optimization heuristics that\ndo not require modifying the original objective. As a result, it avoids the\naforementioned limitations of first-order optimization. We empirically evaluate\nPrivate-GSD against baseline algorithms on data derived from the American\nCommunity Survey across a variety of statistics--otherwise known as statistical\nqueries--both for discrete and real-valued attributes. We show that Private-GSD\noutperforms the state-of-the-art methods on non-differential queries while\nmatching accuracy in approximating differentiable ones.","PeriodicalId":501310,"journal":{"name":"arXiv - CS - Other Computer Science","volume":"238 4","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Other Computer Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2306.03257","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

We study the problem of efficiently generating differentially private synthetic data that approximate the statistical properties of an underlying sensitive dataset. In recent years, there has been a growing line of work that approaches this problem using first-order optimization techniques. However, such techniques are restricted to optimizing differentiable objectives only, severely limiting the types of analyses that can be conducted. For example, first-order mechanisms have been primarily successful in approximating statistical queries only in the form of marginals for discrete data domains. In some cases, one can circumvent such issues by relaxing the task's objective to maintain differentiability. However, even when possible, these approaches impose a fundamental limitation in which modifications to the minimization problem become additional sources of error. Therefore, we propose Private-GSD, a private genetic algorithm based on zeroth-order optimization heuristics that do not require modifying the original objective. As a result, it avoids the aforementioned limitations of first-order optimization. We empirically evaluate Private-GSD against baseline algorithms on data derived from the American Community Survey across a variety of statistics--otherwise known as statistical queries--both for discrete and real-valued attributes. We show that Private-GSD outperforms the state-of-the-art methods on non-differential queries while matching accuracy in approximating differentiable ones.
用遗传算法生成私有合成数据
我们研究了有效地生成近似底层敏感数据集的统计属性的差分私有合成数据的问题。近年来,有越来越多的研究使用一阶优化技术来解决这个问题。然而,这些技术仅限于优化可微分目标,严重限制了可以进行的分析类型。例如,一阶机制主要成功地近似于离散数据域的边际形式的统计查询。在某些情况下,可以通过放松任务的目标来保持可微分性来规避这些问题。然而,即使在可能的情况下,这些方法也有一个基本的限制,即对最小化问题的修改成为额外的误差来源。因此,我们提出了private - gsd,一种不需要修改原始目标的基于零阶优化启发式的私有遗传算法。因此,它避免了上述一阶优化的局限性。我们根据来自美国社区调查(AmericanCommunity Survey)的各种统计数据(也称为统计查询)的基线算法对private - gsd进行了经验评估,这些数据包括离散和实值属性。我们表明private - gsd在非微分查询上优于最先进的方法,同时在近似可微分查询时匹配精度。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信