Private measures, random walks, and synthetic data

IF 1.5 1区 数学 Q2 STATISTICS & PROBABILITY
March Boedihardjo, Thomas Strohmer, Roman Vershynin
{"title":"Private measures, random walks, and synthetic data","authors":"March Boedihardjo, Thomas Strohmer, Roman Vershynin","doi":"10.1007/s00440-024-01279-z","DOIUrl":null,"url":null,"abstract":"<p>Differential privacy is a mathematical concept that provides an information-theoretic security guarantee. While differential privacy has emerged as a de facto standard for guaranteeing privacy in data sharing, the known mechanisms to achieve it come with some serious limitations. Utility guarantees are usually provided only for a fixed, a priori specified set of queries. Moreover, there are no utility guarantees for more complex—but very common—machine learning tasks such as clustering or classification. In this paper we overcome some of these limitations. Working with metric privacy, a powerful generalization of differential privacy, we develop a polynomial-time algorithm that creates a <i>private measure</i> from a data set. This private measure allows us to efficiently construct private synthetic data that are accurate for a wide range of statistical analysis tools. Moreover, we prove an asymptotically sharp min-max result for private measures and synthetic data in general compact metric spaces, for any fixed privacy budget <span>\\(\\varepsilon \\)</span> bounded away from zero. A key ingredient in our construction is a new <i>superregular random walk</i>, whose joint distribution of steps is as regular as that of independent random variables, yet which deviates from the origin logarithmically slowly.\n</p>","PeriodicalId":20527,"journal":{"name":"Probability Theory and Related Fields","volume":null,"pages":null},"PeriodicalIF":1.5000,"publicationDate":"2024-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Probability Theory and Related Fields","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1007/s00440-024-01279-z","RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"STATISTICS & PROBABILITY","Score":null,"Total":0}
引用次数: 0

Abstract

Differential privacy is a mathematical concept that provides an information-theoretic security guarantee. While differential privacy has emerged as a de facto standard for guaranteeing privacy in data sharing, the known mechanisms to achieve it come with some serious limitations. Utility guarantees are usually provided only for a fixed, a priori specified set of queries. Moreover, there are no utility guarantees for more complex—but very common—machine learning tasks such as clustering or classification. In this paper we overcome some of these limitations. Working with metric privacy, a powerful generalization of differential privacy, we develop a polynomial-time algorithm that creates a private measure from a data set. This private measure allows us to efficiently construct private synthetic data that are accurate for a wide range of statistical analysis tools. Moreover, we prove an asymptotically sharp min-max result for private measures and synthetic data in general compact metric spaces, for any fixed privacy budget \(\varepsilon \) bounded away from zero. A key ingredient in our construction is a new superregular random walk, whose joint distribution of steps is as regular as that of independent random variables, yet which deviates from the origin logarithmically slowly.

Abstract Image

私人措施、随机漫步和合成数据
差分隐私是一个数学概念,它提供了一种信息论上的安全保证。虽然差分隐私已成为保证数据共享隐私的事实标准,但实现它的已知机制却有一些严重的局限性。效用保证通常只针对固定的、事先指定的查询集。此外,对于更复杂但非常常见的机器学习任务(如聚类或分类),也没有效用保证。在本文中,我们克服了其中的一些限制。我们利用度量隐私(微分隐私的强大概括),开发了一种多项式时间算法,可以从数据集中创建一个隐私度量。通过这种隐私度量,我们可以高效地构建隐私合成数据,这些数据对于各种统计分析工具来说都是准确的。此外,我们还证明了在一般紧凑度量空间中,对于远离零的任何固定隐私预算(\varepsilon \),私有度量和合成数据的渐近尖锐最小-最大结果。我们构造的一个关键要素是一种新的超规则随机行走,它的步数联合分布与独立随机变量的步数联合分布一样规则,但它偏离原点的对数速度很慢。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Probability Theory and Related Fields
Probability Theory and Related Fields 数学-统计学与概率论
CiteScore
3.70
自引率
5.00%
发文量
71
审稿时长
6-12 weeks
期刊介绍: Probability Theory and Related Fields publishes research papers in modern probability theory and its various fields of application. Thus, subjects of interest include: mathematical statistical physics, mathematical statistics, mathematical biology, theoretical computer science, and applications of probability theory to other areas of mathematics such as combinatorics, analysis, ergodic theory and geometry. Survey papers on emerging areas of importance may be considered for publication. The main languages of publication are English, French and German.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信