Stealing Your Data from Compressed Machine Learning Models

2020 57th ACM/IEEE Design Automation Conference (DAC) Pub Date : 2020-07-01 DOI:10.1109/DAC18072.2020.9218633

Nuo Xu, Qi Liu, Tao Liu, Zihao Liu, Xiaochen Guo, Wujie Wen

{"title":"Stealing Your Data from Compressed Machine Learning Models","authors":"Nuo Xu, Qi Liu, Tao Liu, Zihao Liu, Xiaochen Guo, Wujie Wen","doi":"10.1109/DAC18072.2020.9218633","DOIUrl":null,"url":null,"abstract":"Machine learning models have been widely deployed in many real-world tasks. When a non-expert data holder wants to use a third-party machine learning service for model training, it is critical to preserve the confidentiality of the training data. In this paper, we for the first time explore the potential privacy leakage in a scenario that a malicious ML provider offers data holder customized training code including model compression which is essential in practical deployment The provider is unable to access the training process hosted by the secured third party, but could inquire models when they are released in public. As a result, adversary can extract sensitive training data with high quality even from these deeply compressed models that are tailored for resource-limited devices. Our investigation shows that existing compressions like quantization, can serve as a defense against such an attack, by degrading the model accuracy and memorized data quality simultaneously. To overcome this defense, we take an initial attempt to design a simple but stealthy quantized correlation encoding attack flow from an adversary perspective. Three integrated components-data pre-processing, layer-wise data-weight correlation regularization, data-aware quantization, are developed accordingly. Extensive experimental results show that our framework can preserve the evasiveness and effectiveness of stealing data from compressed models.","PeriodicalId":428807,"journal":{"name":"2020 57th ACM/IEEE Design Automation Conference (DAC)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 57th ACM/IEEE Design Automation Conference (DAC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DAC18072.2020.9218633","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

Machine learning models have been widely deployed in many real-world tasks. When a non-expert data holder wants to use a third-party machine learning service for model training, it is critical to preserve the confidentiality of the training data. In this paper, we for the first time explore the potential privacy leakage in a scenario that a malicious ML provider offers data holder customized training code including model compression which is essential in practical deployment The provider is unable to access the training process hosted by the secured third party, but could inquire models when they are released in public. As a result, adversary can extract sensitive training data with high quality even from these deeply compressed models that are tailored for resource-limited devices. Our investigation shows that existing compressions like quantization, can serve as a defense against such an attack, by degrading the model accuracy and memorized data quality simultaneously. To overcome this defense, we take an initial attempt to design a simple but stealthy quantized correlation encoding attack flow from an adversary perspective. Three integrated components-data pre-processing, layer-wise data-weight correlation regularization, data-aware quantization, are developed accordingly. Extensive experimental results show that our framework can preserve the evasiveness and effectiveness of stealing data from compressed models.

查看原文本刊更多论文

从压缩机器学习模型中窃取数据

机器学习模型已经广泛应用于许多现实世界的任务中。当非专业数据持有者希望使用第三方机器学习服务进行模型训练时，保护训练数据的机密性至关重要。在本文中，我们首次探讨了恶意机器学习提供者提供数据持有者定制的训练代码(包括实际部署中必不可少的模型压缩)的潜在隐私泄露情况。提供者无法访问由安全第三方托管的训练过程，但可以在模型公开发布时查询模型。因此，对手甚至可以从这些为资源有限的设备量身定制的深度压缩模型中提取高质量的敏感训练数据。我们的调查表明，现有的压缩(如量化)可以通过同时降低模型精度和记忆数据质量来防御这种攻击。为了克服这种防御，我们首先尝试从对手的角度设计一个简单但隐蔽的量化相关编码攻击流。相应开发了数据预处理、分层数据权重关联正则化、数据感知量化三个集成组件。大量的实验结果表明，我们的框架可以保持从压缩模型中窃取数据的回避性和有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2020 57th ACM/IEEE Design Automation Conference (DAC)

自引率

0.00%

发文量