Privacy accounting and quality control in the sage differentially private ML platform

Proceedings of the 27th ACM Symposium on Operating Systems Principles Pub Date : 2019-09-04 DOI:10.1145/3341301.3359639

Mathias Lécuyer, Riley Spahn, Kiran Vodrahalli, Roxana Geambasu, Daniel J. Hsu

{"title":"Privacy accounting and quality control in the sage differentially private ML platform","authors":"Mathias Lécuyer, Riley Spahn, Kiran Vodrahalli, Roxana Geambasu, Daniel J. Hsu","doi":"10.1145/3341301.3359639","DOIUrl":null,"url":null,"abstract":"Companies increasingly expose machine learning (ML) models trained over sensitive user data to untrusted domains, such as end-user devices and wide-access model stores. This creates a need to control the data's leakage through these models. We present Sage, a differentially private (DP) ML platform that bounds the cumulative leakage of training data through models. Sage builds upon the rich literature on DP ML algorithms and contributes pragmatic solutions to two of the most pressing systems challenges of global DP: running out of privacy budget and the privacy-utility tradeoff. To address the former, we develop block composition, a new privacy loss accounting method that leverages the growing database regime of ML workloads to keep training models endlessly on a sensitive data stream while enforcing a global DP guarantee for the stream. To address the latter, we develop privacy-adaptive training, a process that trains a model on growing amounts of data and/or with increasing privacy parameters until, with high probability, the model meets developer-configured quality criteria. Sage's methods are designed to integrate with TensorFlow-Extended, Google's open-source ML platform. They illustrate how a systems focus on characteristics of ML workloads enables pragmatic solutions that are not apparent when one focuses on individual algorithms, as most DP ML literature does.","PeriodicalId":331561,"journal":{"name":"Proceedings of the 27th ACM Symposium on Operating Systems Principles","volume":"84 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 27th ACM Symposium on Operating Systems Principles","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3341301.3359639","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 7

Abstract

Companies increasingly expose machine learning (ML) models trained over sensitive user data to untrusted domains, such as end-user devices and wide-access model stores. This creates a need to control the data's leakage through these models. We present Sage, a differentially private (DP) ML platform that bounds the cumulative leakage of training data through models. Sage builds upon the rich literature on DP ML algorithms and contributes pragmatic solutions to two of the most pressing systems challenges of global DP: running out of privacy budget and the privacy-utility tradeoff. To address the former, we develop block composition, a new privacy loss accounting method that leverages the growing database regime of ML workloads to keep training models endlessly on a sensitive data stream while enforcing a global DP guarantee for the stream. To address the latter, we develop privacy-adaptive training, a process that trains a model on growing amounts of data and/or with increasing privacy parameters until, with high probability, the model meets developer-configured quality criteria. Sage's methods are designed to integrate with TensorFlow-Extended, Google's open-source ML platform. They illustrate how a systems focus on characteristics of ML workloads enables pragmatic solutions that are not apparent when one focuses on individual algorithms, as most DP ML literature does.

查看原文本刊更多论文

sage差分私有ML平台中的隐私会计和质量控制

越来越多的公司将在敏感用户数据上训练的机器学习(ML)模型暴露在不受信任的领域，例如最终用户设备和广泛访问的模型存储。这就需要控制通过这些模型泄漏的数据。我们提出了Sage，一个差分私有(DP)机器学习平台，通过模型限制训练数据的累积泄漏。Sage建立在DP ML算法的丰富文献基础上，并为全球DP的两个最紧迫的系统挑战提供了实用的解决方案:耗尽隐私预算和隐私效用权衡。为了解决前者，我们开发了块组合(block composition)，这是一种新的隐私损失核算方法，它利用ML工作负载不断增长的数据库机制，在对敏感数据流执行全局DP保证的同时，将训练模型无休止地保持在敏感数据流上。为了解决后者，我们开发了自适应隐私训练，这是一个在不断增长的数据量和/或不断增加的隐私参数上训练模型的过程，直到模型很可能满足开发人员配置的质量标准。Sage的方法旨在与TensorFlow-Extended(谷歌的开源机器学习平台)集成。它们说明了一个专注于机器学习工作负载特征的系统如何实现实用的解决方案，当一个人专注于单个算法时，这些解决方案并不明显，正如大多数DP ML文献所做的那样。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 27th ACM Symposium on Operating Systems Principles

自引率

0.00%

发文量