论凸优化的噪声随机梯度下降的隐私性

IF 1.2 3区 计算机科学 Q3 COMPUTER SCIENCE, THEORY & METHODS
Jason M. Altschuler, Jinho Bok, Kunal Talwar
{"title":"论凸优化的噪声随机梯度下降的隐私性","authors":"Jason M. Altschuler, Jinho Bok, Kunal Talwar","doi":"10.1137/23m1556538","DOIUrl":null,"url":null,"abstract":"SIAM Journal on Computing, Volume 53, Issue 4, Page 969-1001, August 2024. <br/> Abstract. A central issue in machine learning is how to train models on sensitive user data. Industry has widely adopted a simple algorithm: Stochastic Gradient Descent (SGD) with noise (a.k.a. Stochastic Gradient Langevin Dynamics). However, foundational theoretical questions about this algorithm’s privacy loss remain open—even in the seemingly simple setting of smooth convex losses over a bounded domain. Our main result resolves these questions: for a large range of parameters, we characterize the differential privacy up to a constant factor. This result reveals that all previous analyses for this setting have the wrong qualitative behavior. Specifically, while previous privacy analyses increase ad infinitum in the number of iterations, we show that after a small burn-in period, running SGD longer leaks no further privacy. Our analysis departs from previous approaches based on fast mixing, instead using techniques based on optimal transport (namely, Privacy Amplification by Iteration) and the Sampled Gaussian Mechanism (namely, Privacy Amplification by Sampling). Our techniques readily extend to other settings, e.g., strongly convex losses, nonuniform stepsizes, arbitrary batch sizes, and random or cyclic choice of batches.","PeriodicalId":49532,"journal":{"name":"SIAM Journal on Computing","volume":"18 1","pages":""},"PeriodicalIF":1.2000,"publicationDate":"2024-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"On the Privacy of Noisy Stochastic Gradient Descent for Convex Optimization\",\"authors\":\"Jason M. Altschuler, Jinho Bok, Kunal Talwar\",\"doi\":\"10.1137/23m1556538\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"SIAM Journal on Computing, Volume 53, Issue 4, Page 969-1001, August 2024. <br/> Abstract. A central issue in machine learning is how to train models on sensitive user data. Industry has widely adopted a simple algorithm: Stochastic Gradient Descent (SGD) with noise (a.k.a. Stochastic Gradient Langevin Dynamics). However, foundational theoretical questions about this algorithm’s privacy loss remain open—even in the seemingly simple setting of smooth convex losses over a bounded domain. Our main result resolves these questions: for a large range of parameters, we characterize the differential privacy up to a constant factor. This result reveals that all previous analyses for this setting have the wrong qualitative behavior. Specifically, while previous privacy analyses increase ad infinitum in the number of iterations, we show that after a small burn-in period, running SGD longer leaks no further privacy. Our analysis departs from previous approaches based on fast mixing, instead using techniques based on optimal transport (namely, Privacy Amplification by Iteration) and the Sampled Gaussian Mechanism (namely, Privacy Amplification by Sampling). Our techniques readily extend to other settings, e.g., strongly convex losses, nonuniform stepsizes, arbitrary batch sizes, and random or cyclic choice of batches.\",\"PeriodicalId\":49532,\"journal\":{\"name\":\"SIAM Journal on Computing\",\"volume\":\"18 1\",\"pages\":\"\"},\"PeriodicalIF\":1.2000,\"publicationDate\":\"2024-07-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"SIAM Journal on Computing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1137/23m1556538\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, THEORY & METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"SIAM Journal on Computing","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1137/23m1556538","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}
引用次数: 0

摘要

SIAM 计算期刊》,第 53 卷第 4 期,第 969-1001 页,2024 年 8 月。 摘要机器学习的一个核心问题是如何在敏感用户数据上训练模型。业界广泛采用了一种简单的算法:带噪声的随机梯度下降算法(SGD)(又称随机梯度朗文动力学)。然而,关于这种算法的隐私损失的基础理论问题仍未解决--即使是在有界域上的光滑凸损失这一看似简单的设置中。我们的主要结果解决了这些问题:对于大范围的参数,我们描述了差分隐私性的常数因子。这一结果揭示出,之前所有针对这种设置的分析都有错误的定性行为。具体来说,以前的隐私分析会随着迭代次数的增加而无限增加,而我们的分析表明,在经过一小段时间的磨合期后,再运行 SGD 就不会泄露更多隐私了。我们的分析不同于以往基于快速混合的方法,而是采用了基于最优传输(即迭代隐私放大)和采样高斯机制(即采样隐私放大)的技术。我们的技术很容易扩展到其他设置,例如强凸损失、非均匀步长、任意批次大小以及批次的随机或循环选择。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
On the Privacy of Noisy Stochastic Gradient Descent for Convex Optimization
SIAM Journal on Computing, Volume 53, Issue 4, Page 969-1001, August 2024.
Abstract. A central issue in machine learning is how to train models on sensitive user data. Industry has widely adopted a simple algorithm: Stochastic Gradient Descent (SGD) with noise (a.k.a. Stochastic Gradient Langevin Dynamics). However, foundational theoretical questions about this algorithm’s privacy loss remain open—even in the seemingly simple setting of smooth convex losses over a bounded domain. Our main result resolves these questions: for a large range of parameters, we characterize the differential privacy up to a constant factor. This result reveals that all previous analyses for this setting have the wrong qualitative behavior. Specifically, while previous privacy analyses increase ad infinitum in the number of iterations, we show that after a small burn-in period, running SGD longer leaks no further privacy. Our analysis departs from previous approaches based on fast mixing, instead using techniques based on optimal transport (namely, Privacy Amplification by Iteration) and the Sampled Gaussian Mechanism (namely, Privacy Amplification by Sampling). Our techniques readily extend to other settings, e.g., strongly convex losses, nonuniform stepsizes, arbitrary batch sizes, and random or cyclic choice of batches.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
SIAM Journal on Computing
SIAM Journal on Computing 工程技术-计算机:理论方法
CiteScore
4.60
自引率
0.00%
发文量
68
审稿时长
6-12 weeks
期刊介绍: The SIAM Journal on Computing aims to provide coverage of the most significant work going on in the mathematical and formal aspects of computer science and nonnumerical computing. Submissions must be clearly written and make a significant technical contribution. Topics include but are not limited to analysis and design of algorithms, algorithmic game theory, data structures, computational complexity, computational algebra, computational aspects of combinatorics and graph theory, computational biology, computational geometry, computational robotics, the mathematical aspects of programming languages, artificial intelligence, computational learning, databases, information retrieval, cryptography, networks, distributed computing, parallel algorithms, and computer architecture.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信