DP-UTIL:机器学习中差分隐私的综合效用分析

Proceedings of the Twelfth ACM Conference on Data and Application Security and Privacy Pub Date : 2021-12-24 DOI:10.1145/3508398.3511513

Ismat Jarin, Birhanu Eshete

{"title":"DP-UTIL:机器学习中差分隐私的综合效用分析","authors":"Ismat Jarin, Birhanu Eshete","doi":"10.1145/3508398.3511513","DOIUrl":null,"url":null,"abstract":"Differential Privacy (DP) has emerged as a rigorous formalism to quantify privacy protection provided by an algorithm that operates on privacy sensitive data. In machine learning (ML), DP has been employed to limit inference/disclosure of training examples. Prior work leveraged DP across the ML pipeline, albeit in isolation, often focusing on mechanisms such as gradient perturbation. In this paper, we present DP-UTIL, a holistic utility analysis framework of DP across the ML pipeline with focus on input perturbation, objective perturbation, gradient perturbation, output perturbation, and prediction perturbation. Given an ML task on privacy-sensitive data, DP-UTIL enables a ML privacy practitioner to perform holistic comparative analysis on the impact of DP in these five perturbation spots, measured in terms of model utility loss, privacy leakage, and the number of truly revealed training samples. We evaluate DP-UTIL over classification tasks on vision, medical, and financial datasets, using two representative learning algorithms (logistic regression and deep neural network) against membership inference attack as a case study attack. One of the highlights of our results is that prediction perturbation consistently achieves the lowest utility loss on all models across all datasets. In logistic regression models, objective perturbation results in lowest privacy leakage compared to other perturbation techniques. For deep neural networks, gradient perturbation results in lowest privacy leakage. Moreover, our results on true revealed records suggest that as privacy leakage increases a differentially private model reveals a greater number of member samples. Overall, our findings suggest that to make informed decisions as to the choice of perturbation mechanisms, a ML privacy practitioner needs to examine the dynamics among optimization techniques (convex vs. non-convex), number of classes, and privacy budget.","PeriodicalId":102306,"journal":{"name":"Proceedings of the Twelfth ACM Conference on Data and Application Security and Privacy","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"DP-UTIL: Comprehensive Utility Analysis of Differential Privacy in Machine Learning\",\"authors\":\"Ismat Jarin, Birhanu Eshete\",\"doi\":\"10.1145/3508398.3511513\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Differential Privacy (DP) has emerged as a rigorous formalism to quantify privacy protection provided by an algorithm that operates on privacy sensitive data. In machine learning (ML), DP has been employed to limit inference/disclosure of training examples. Prior work leveraged DP across the ML pipeline, albeit in isolation, often focusing on mechanisms such as gradient perturbation. In this paper, we present DP-UTIL, a holistic utility analysis framework of DP across the ML pipeline with focus on input perturbation, objective perturbation, gradient perturbation, output perturbation, and prediction perturbation. Given an ML task on privacy-sensitive data, DP-UTIL enables a ML privacy practitioner to perform holistic comparative analysis on the impact of DP in these five perturbation spots, measured in terms of model utility loss, privacy leakage, and the number of truly revealed training samples. We evaluate DP-UTIL over classification tasks on vision, medical, and financial datasets, using two representative learning algorithms (logistic regression and deep neural network) against membership inference attack as a case study attack. One of the highlights of our results is that prediction perturbation consistently achieves the lowest utility loss on all models across all datasets. In logistic regression models, objective perturbation results in lowest privacy leakage compared to other perturbation techniques. For deep neural networks, gradient perturbation results in lowest privacy leakage. Moreover, our results on true revealed records suggest that as privacy leakage increases a differentially private model reveals a greater number of member samples. Overall, our findings suggest that to make informed decisions as to the choice of perturbation mechanisms, a ML privacy practitioner needs to examine the dynamics among optimization techniques (convex vs. non-convex), number of classes, and privacy budget.\",\"PeriodicalId\":102306,\"journal\":{\"name\":\"Proceedings of the Twelfth ACM Conference on Data and Application Security and Privacy\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-12-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the Twelfth ACM Conference on Data and Application Security and Privacy\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3508398.3511513\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Twelfth ACM Conference on Data and Application Security and Privacy","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3508398.3511513","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

摘要

差分隐私(DP)是一种严格的形式，它通过对隐私敏感数据进行操作的算法来量化隐私保护。在机器学习(ML)中，DP被用来限制训练样例的推理/披露。先前的工作利用了跨ML管道的DP，尽管是孤立的，但通常侧重于梯度扰动等机制。在本文中，我们提出了DP- util，这是一个跨ML管道的DP整体效用分析框架，重点关注输入扰动、客观扰动、梯度扰动、输出扰动和预测扰动。给定一个关于隐私敏感数据的ML任务，DP- util使ML隐私从业者能够对DP在这五个扰动点的影响进行全面的比较分析，以模型效用损失、隐私泄漏和真正显示的训练样本数量来衡量。我们在视觉、医疗和金融数据集的分类任务上评估DP-UTIL，使用两种代表性的学习算法(逻辑回归和深度神经网络)来对抗成员推理攻击作为案例研究攻击。我们结果的亮点之一是预测扰动始终在所有数据集的所有模型上实现最低的效用损失。在逻辑回归模型中，与其他扰动技术相比，客观扰动导致的隐私泄漏最小。对于深度神经网络，梯度扰动导致最小的隐私泄漏。此外，我们对真实披露记录的结果表明，随着隐私泄露的增加，差异隐私模型揭示了更多的成员样本。总的来说，我们的研究结果表明，为了做出关于扰动机制选择的明智决策，ML隐私从业者需要检查优化技术(凸与非凸)、类数量和隐私预算之间的动态关系。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

DP-UTIL: Comprehensive Utility Analysis of Differential Privacy in Machine Learning

Differential Privacy (DP) has emerged as a rigorous formalism to quantify privacy protection provided by an algorithm that operates on privacy sensitive data. In machine learning (ML), DP has been employed to limit inference/disclosure of training examples. Prior work leveraged DP across the ML pipeline, albeit in isolation, often focusing on mechanisms such as gradient perturbation. In this paper, we present DP-UTIL, a holistic utility analysis framework of DP across the ML pipeline with focus on input perturbation, objective perturbation, gradient perturbation, output perturbation, and prediction perturbation. Given an ML task on privacy-sensitive data, DP-UTIL enables a ML privacy practitioner to perform holistic comparative analysis on the impact of DP in these five perturbation spots, measured in terms of model utility loss, privacy leakage, and the number of truly revealed training samples. We evaluate DP-UTIL over classification tasks on vision, medical, and financial datasets, using two representative learning algorithms (logistic regression and deep neural network) against membership inference attack as a case study attack. One of the highlights of our results is that prediction perturbation consistently achieves the lowest utility loss on all models across all datasets. In logistic regression models, objective perturbation results in lowest privacy leakage compared to other perturbation techniques. For deep neural networks, gradient perturbation results in lowest privacy leakage. Moreover, our results on true revealed records suggest that as privacy leakage increases a differentially private model reveals a greater number of member samples. Overall, our findings suggest that to make informed decisions as to the choice of perturbation mechanisms, a ML privacy practitioner needs to examine the dynamics among optimization techniques (convex vs. non-convex), number of classes, and privacy budget.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the Twelfth ACM Conference on Data and Application Security and Privacy

自引率

0.00%

发文量