SecureML: A System for Scalable Privacy-Preserving Machine Learning

2017 IEEE Symposium on Security and Privacy (SP) Pub Date : 2017-05-22 DOI:10.1109/SP.2017.12

Payman Mohassel, Yupeng Zhang

{"title":"SecureML: A System for Scalable Privacy-Preserving Machine Learning","authors":"Payman Mohassel, Yupeng Zhang","doi":"10.1109/SP.2017.12","DOIUrl":null,"url":null,"abstract":"Machine learning is widely used in practice to produce predictive models for applications such as image processing, speech and text recognition. These models are more accurate when trained on large amount of data collected from different sources. However, the massive data collection raises privacy concerns. In this paper, we present new and efficient protocols for privacy preserving machine learning for linear regression, logistic regression and neural network training using the stochastic gradient descent method. Our protocols fall in the two-server model where data owners distribute their private data among two non-colluding servers who train various models on the joint data using secure two-party computation (2PC). We develop new techniques to support secure arithmetic operations on shared decimal numbers, and propose MPC-friendly alternatives to non-linear functions such as sigmoid and softmax that are superior to prior work. We implement our system in C++. Our experiments validate that our protocols are several orders of magnitude faster than the state of the art implementations for privacy preserving linear and logistic regressions, and scale to millions of data samples with thousands of features. We also implement the first privacy preserving system for training neural networks.","PeriodicalId":6502,"journal":{"name":"2017 IEEE Symposium on Security and Privacy (SP)","volume":"21 1","pages":"19-38"},"PeriodicalIF":0.0000,"publicationDate":"2017-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1370","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE Symposium on Security and Privacy (SP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SP.2017.12","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1370

Abstract

Machine learning is widely used in practice to produce predictive models for applications such as image processing, speech and text recognition. These models are more accurate when trained on large amount of data collected from different sources. However, the massive data collection raises privacy concerns. In this paper, we present new and efficient protocols for privacy preserving machine learning for linear regression, logistic regression and neural network training using the stochastic gradient descent method. Our protocols fall in the two-server model where data owners distribute their private data among two non-colluding servers who train various models on the joint data using secure two-party computation (2PC). We develop new techniques to support secure arithmetic operations on shared decimal numbers, and propose MPC-friendly alternatives to non-linear functions such as sigmoid and softmax that are superior to prior work. We implement our system in C++. Our experiments validate that our protocols are several orders of magnitude faster than the state of the art implementations for privacy preserving linear and logistic regressions, and scale to millions of data samples with thousands of features. We also implement the first privacy preserving system for training neural networks.

查看原文本刊更多论文

SecureML:一个可扩展的隐私保护机器学习系统

机器学习在实践中被广泛用于为图像处理、语音和文本识别等应用生成预测模型。当对从不同来源收集的大量数据进行训练时，这些模型更加准确。然而，大规模的数据收集引发了人们对隐私的担忧。在本文中，我们提出了新的有效的协议，用于线性回归，逻辑回归和使用随机梯度下降方法的神经网络训练的隐私保护机器学习。我们的协议属于双服务器模型，其中数据所有者将他们的私有数据分发到两个非串通的服务器上，这些服务器使用安全的两方计算(2PC)在联合数据上训练各种模型。我们开发了新技术来支持共享十进制数的安全算术运算，并提出了mpc友好的非线性函数替代方案，如sigmoid和softmax，这些替代方案优于先前的工作。我们用c++实现我们的系统。我们的实验验证了我们的协议比目前最先进的隐私保护线性和逻辑回归实现快几个数量级，并且可以扩展到具有数千个特征的数百万个数据样本。我们还实现了第一个用于训练神经网络的隐私保护系统。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2017 IEEE Symposium on Security and Privacy (SP)

自引率

0.00%

发文量