Privacy-preserving deep learning

2015 53rd Annual Allerton Conference on Communication, Control, and Computing (Allerton) Pub Date : 2015-09-01 DOI:10.1145/2810103.2813687

R. Shokri, Vitaly Shmatikov

{"title":"Privacy-preserving deep learning","authors":"R. Shokri, Vitaly Shmatikov","doi":"10.1145/2810103.2813687","DOIUrl":null,"url":null,"abstract":"Deep learning based on artificial neural networks is a very popular approach to modeling, classifying, and recognizing complex data such as images, speech, and text. The unprecedented accuracy of deep learning methods has turned them into the foundation of new AI-based services on the Internet. Commercial companies that collect user data on a large scale have been the main beneficiaries of this trend since the success of deep learning techniques is directly proportional to the amount of data available for training. Massive data collection required for deep learning presents obvious privacy issues. Users' personal, highly sensitive data such as photos and voice recordings is kept indefinitely by the companies that collect it. Users can neither delete it, nor restrict the purposes for which it is used. Furthermore, centrally kept data is subject to legal subpoenas and extrajudicial surveillance. Many data owners-for example, medical institutions that may want to apply deep learning methods to clinical records-are prevented by privacy and confidentiality concerns from sharing the data and thus benefitting from large-scale deep learning. In this paper, we present a practical system that enables multiple parties to jointly learn an accurate neural-network model for a given objective without sharing their input datasets. We exploit the fact that the optimization algorithms used in modern deep learning, namely, those based on stochastic gradient descent, can be parallelized and executed asynchronously. Our system lets participants train independently on their own datasets and selectively share small subsets of their models' key parameters during training. This offers an attractive point in the utility/privacy tradeoff space: participants preserve the privacy of their respective data while still benefitting from other participants' models and thus boosting their learning accuracy beyond what is achievable solely on their own inputs. We demonstrate the accuracy of our privacy-preserving deep learning on benchmark datasets.","PeriodicalId":112948,"journal":{"name":"2015 53rd Annual Allerton Conference on Communication, Control, and Computing (Allerton)","volume":"183 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1861","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 53rd Annual Allerton Conference on Communication, Control, and Computing (Allerton)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2810103.2813687","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1861

Abstract

Deep learning based on artificial neural networks is a very popular approach to modeling, classifying, and recognizing complex data such as images, speech, and text. The unprecedented accuracy of deep learning methods has turned them into the foundation of new AI-based services on the Internet. Commercial companies that collect user data on a large scale have been the main beneficiaries of this trend since the success of deep learning techniques is directly proportional to the amount of data available for training. Massive data collection required for deep learning presents obvious privacy issues. Users' personal, highly sensitive data such as photos and voice recordings is kept indefinitely by the companies that collect it. Users can neither delete it, nor restrict the purposes for which it is used. Furthermore, centrally kept data is subject to legal subpoenas and extrajudicial surveillance. Many data owners-for example, medical institutions that may want to apply deep learning methods to clinical records-are prevented by privacy and confidentiality concerns from sharing the data and thus benefitting from large-scale deep learning. In this paper, we present a practical system that enables multiple parties to jointly learn an accurate neural-network model for a given objective without sharing their input datasets. We exploit the fact that the optimization algorithms used in modern deep learning, namely, those based on stochastic gradient descent, can be parallelized and executed asynchronously. Our system lets participants train independently on their own datasets and selectively share small subsets of their models' key parameters during training. This offers an attractive point in the utility/privacy tradeoff space: participants preserve the privacy of their respective data while still benefitting from other participants' models and thus boosting their learning accuracy beyond what is achievable solely on their own inputs. We demonstrate the accuracy of our privacy-preserving deep learning on benchmark datasets.

查看原文本刊更多论文

保护隐私的深度学习

基于人工神经网络的深度学习是一种非常流行的建模、分类和识别复杂数据(如图像、语音和文本)的方法。深度学习方法前所未有的准确性使其成为互联网上基于人工智能的新服务的基础。大规模收集用户数据的商业公司是这一趋势的主要受益者，因为深度学习技术的成功与可用于培训的数据量成正比。深度学习所需的大量数据收集存在明显的隐私问题。用户的个人、高度敏感的数据，如照片和录音，由收集这些数据的公司无限期保存。用户既不能删除它，也不能限制它的使用目的。此外，集中保存的数据受到法律传票和法外监视的约束。许多数据所有者(例如，可能希望将深度学习方法应用于临床记录的医疗机构)由于隐私和机密性问题而无法共享数据，从而无法从大规模深度学习中受益。在本文中，我们提出了一个实用的系统，该系统使多方能够在不共享其输入数据集的情况下共同学习给定目标的精确神经网络模型。我们利用现代深度学习中使用的优化算法，即基于随机梯度下降的优化算法，可以并行化和异步执行。我们的系统允许参与者在他们自己的数据集上独立训练，并在训练期间有选择地共享他们模型关键参数的小子集。这在效用/隐私权衡领域提供了一个有吸引力的点:参与者在保护各自数据的隐私的同时，仍然受益于其他参与者的模型，从而提高他们的学习准确性，而不仅仅是他们自己的输入。我们在基准数据集上证明了我们的隐私保护深度学习的准确性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2015 53rd Annual Allerton Conference on Communication, Control, and Computing (Allerton)

自引率

0.00%

发文量