A Methodology for Formalizing Model-Inversion Attacks

2016 IEEE 29th Computer Security Foundations Symposium (CSF) Pub Date : 2016-06-01 DOI:10.1109/CSF.2016.32

Xi Wu, Matt Fredrikson, S. Jha, J. Naughton

{"title":"A Methodology for Formalizing Model-Inversion Attacks","authors":"Xi Wu, Matt Fredrikson, S. Jha, J. Naughton","doi":"10.1109/CSF.2016.32","DOIUrl":null,"url":null,"abstract":"Confidentiality of training data induced by releasing machine-learning models, and has recently received increasing attention. Motivated by existing MI attacks and other previous attacks that turn out to be MI \"in disguise,\" this paper initiates a formal study of MI attacks by presenting a game-based methodology. Our methodology uncovers a number of subtle issues, and devising a rigorous game-based definition, analogous to those in cryptography, is an interesting avenue for future work. We describe methodologies for two types of attacks. The first is for black-box attacks, which consider an adversary who infers sensitive values with only oracle access to a model. The second methodology targets the white-box scenario where an adversary has some additional knowledge about the structure of a model. For the restricted class of Boolean models and black-box attacks, we characterize model invertibility using the concept of influence from Boolean analysis in the noiseless case, and connect model invertibility with stable influence in the noisy case. Interestingly, we also discovered an intriguing phenomenon, which we call \"invertibility interference,\" where a highly invertible model quickly becomes highly non-invertible by adding little noise. For the white-box case, we consider a common phenomenon in machine-learning models where the model is a sequential composition of several sub-models. We show, quantitatively, that even very restricted communication between layers could leak a significant amount of information. Perhaps more importantly, our study also unveils unexpected computational power of these restricted communication channels, which, to the best of our knowledge, were not previously known.","PeriodicalId":6500,"journal":{"name":"2016 IEEE 29th Computer Security Foundations Symposium (CSF)","volume":"30 6","pages":"355-370"},"PeriodicalIF":0.0000,"publicationDate":"2016-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"143","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE 29th Computer Security Foundations Symposium (CSF)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CSF.2016.32","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 143

Abstract

Confidentiality of training data induced by releasing machine-learning models, and has recently received increasing attention. Motivated by existing MI attacks and other previous attacks that turn out to be MI "in disguise," this paper initiates a formal study of MI attacks by presenting a game-based methodology. Our methodology uncovers a number of subtle issues, and devising a rigorous game-based definition, analogous to those in cryptography, is an interesting avenue for future work. We describe methodologies for two types of attacks. The first is for black-box attacks, which consider an adversary who infers sensitive values with only oracle access to a model. The second methodology targets the white-box scenario where an adversary has some additional knowledge about the structure of a model. For the restricted class of Boolean models and black-box attacks, we characterize model invertibility using the concept of influence from Boolean analysis in the noiseless case, and connect model invertibility with stable influence in the noisy case. Interestingly, we also discovered an intriguing phenomenon, which we call "invertibility interference," where a highly invertible model quickly becomes highly non-invertible by adding little noise. For the white-box case, we consider a common phenomenon in machine-learning models where the model is a sequential composition of several sub-models. We show, quantitatively, that even very restricted communication between layers could leak a significant amount of information. Perhaps more importantly, our study also unveils unexpected computational power of these restricted communication channels, which, to the best of our knowledge, were not previously known.

查看原文本刊更多论文

一种形式化模型反转攻击的方法

由发布机器学习模型引起的训练数据的保密性，最近受到越来越多的关注。受现有的MI攻击和其他先前被证明是MI“伪装”的攻击的启发，本文通过提出基于游戏的方法，启动了对MI攻击的正式研究。我们的方法揭示了许多微妙的问题，设计一个严格的基于游戏的定义，类似于密码学中的定义，是未来工作的一个有趣的途径。我们描述了两种攻击的方法。第一种是针对黑盒攻击，它考虑的是对手仅通过对模型的oracle访问来推断敏感值。第二种方法针对的是白盒场景，攻击者对模型的结构有一些额外的知识。对于布尔模型和黑盒攻击的限制类，我们在无噪声情况下使用布尔分析的影响概念来表征模型的可逆性，在有噪声情况下将模型的可逆性与稳定影响联系起来。有趣的是，我们还发现了一个有趣的现象，我们称之为“可逆性干扰”，即通过添加少量噪声，高度可逆的模型很快变得高度不可逆。对于白盒案例，我们考虑机器学习模型中的一个常见现象，其中模型是几个子模型的顺序组合。我们从数量上表明，即使是层之间非常有限的通信也可能泄露大量信息。也许更重要的是，我们的研究还揭示了这些受限制的通信渠道的意想不到的计算能力，据我们所知，这是以前不知道的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2016 IEEE 29th Computer Security Foundations Symposium (CSF)

自引率

0.00%

发文量