Differentially Private Noisy Search with Applications to Anomaly Detection (Abstract)

Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security Pub Date : 2017-11-03 DOI:10.1145/3128572.3140456

D. M. Bittner, A. Sarwate, R. Wright

{"title":"Differentially Private Noisy Search with Applications to Anomaly Detection (Abstract)","authors":"D. M. Bittner, A. Sarwate, R. Wright","doi":"10.1145/3128572.3140456","DOIUrl":null,"url":null,"abstract":"We consider the problem of privacy-sensitive anomaly detection - screening to detect individuals, behaviors, areas, or data samples of high interest. What defines an anomaly is context-specific; for example, a spoofed rather than genuine user attempting to log in to a web site, a fraudulent credit card transaction, or a suspicious traveler in an airport. The unifying assumption is that the number of anomalous points is quite small with respect to the population, so that deep screening of all individual data points would potentially be time-intensive, costly, and unnecessarily invasive of privacy. Such privacy violations can raise concerns due sensitive nature of data being used, raise fears about violations of data use agreements, and make people uncomfortable with anomaly detection methods. Anomaly detection is well studied, but methods to provide anomaly detection along with privacy are less well studied. Our overall goal in this research is to provide a framework for identifying anomalous data while guaranteeing quantifiable privacy in a rigorous sense. Once identified, such anomalies could warrant further data collection and investigation, depending on the context and relevant policies. In this research, we focus on privacy protection during the deployment of anomaly detection. Our main contribution is a differentially private access mechanism for finding anomalies using a search algorithm based on adaptive noisy group testing. To achieve this, we take as our starting point the notion of group testing [1], which was most famously used to screen US military draftees for syphilis during World War II. In group testing, individuals are tested in groups to limit the number of tests. Using multiple rounds of screenings, a small number of positive individuals can be detected very efficiently. Group testing has the added benefit of providing privacy to individuals through plausible deniability - since the group tests use aggregate data, individual contributions to the test are masked by the group. We follow on these concepts by demonstrating a search model utilizing adaptive queries on aggregated group data. Our work takes the first steps toward strengthening and formalizing these privacy concepts by achieving differential privacy [2]. Differential privacy is a statistical measure of disclosure risk that captures the intuition that an individual's privacy is protected if the results of a computation have at most a very small and quantifiable dependence on that individual's data. In the last decade, there hpractical adoption underway by high-profile companies such as Apple, Google, and Uber. In order to make differential privacy meaningful in the context of a task that seeks to specifically identify some (anomalous) individuals, we introduce the notion of anomaly-restricted differential privacy. Using ideas from information theory, we show that noise can be added to group query results in a way that provides differential privacy for non-anomalous individuals and still enables efficient and accurate detection of the anomalous individuals. Our method ensures that using differentially private aggregation of groups of points, providing privacy to individuals within the group while refining the group selection to the point that we can probabilistically narrow attention to a small numbers of individuals or samples for further attention. To summarize: We introduce a new notion of anomaly-restriction differential privacy, which may be of independent interest. We provide a noisy group-based search algorithm that satisfies the anomaly-restricted differential privacy definition. We provide both theoretical and empirical analysis of our noisy search algorithm, showing that it performs well in some cases, and exhibits the usual privacy/accuracy tradeoff of differentially private mechanisms. Potential anomaly detection applications for our work might include spatial search for outliers: this would rely on new sensing technologies that can perform queries in aggregate to reveal and isolate anomalous outliers. For example, this could lead to privacy-sensitive methods for searching for outlying cell phone activity patterns or Internet activity patterns in a geographic location.","PeriodicalId":318259,"journal":{"name":"Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security","volume":"47 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3128572.3140456","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

We consider the problem of privacy-sensitive anomaly detection - screening to detect individuals, behaviors, areas, or data samples of high interest. What defines an anomaly is context-specific; for example, a spoofed rather than genuine user attempting to log in to a web site, a fraudulent credit card transaction, or a suspicious traveler in an airport. The unifying assumption is that the number of anomalous points is quite small with respect to the population, so that deep screening of all individual data points would potentially be time-intensive, costly, and unnecessarily invasive of privacy. Such privacy violations can raise concerns due sensitive nature of data being used, raise fears about violations of data use agreements, and make people uncomfortable with anomaly detection methods. Anomaly detection is well studied, but methods to provide anomaly detection along with privacy are less well studied. Our overall goal in this research is to provide a framework for identifying anomalous data while guaranteeing quantifiable privacy in a rigorous sense. Once identified, such anomalies could warrant further data collection and investigation, depending on the context and relevant policies. In this research, we focus on privacy protection during the deployment of anomaly detection. Our main contribution is a differentially private access mechanism for finding anomalies using a search algorithm based on adaptive noisy group testing. To achieve this, we take as our starting point the notion of group testing [1], which was most famously used to screen US military draftees for syphilis during World War II. In group testing, individuals are tested in groups to limit the number of tests. Using multiple rounds of screenings, a small number of positive individuals can be detected very efficiently. Group testing has the added benefit of providing privacy to individuals through plausible deniability - since the group tests use aggregate data, individual contributions to the test are masked by the group. We follow on these concepts by demonstrating a search model utilizing adaptive queries on aggregated group data. Our work takes the first steps toward strengthening and formalizing these privacy concepts by achieving differential privacy [2]. Differential privacy is a statistical measure of disclosure risk that captures the intuition that an individual's privacy is protected if the results of a computation have at most a very small and quantifiable dependence on that individual's data. In the last decade, there hpractical adoption underway by high-profile companies such as Apple, Google, and Uber. In order to make differential privacy meaningful in the context of a task that seeks to specifically identify some (anomalous) individuals, we introduce the notion of anomaly-restricted differential privacy. Using ideas from information theory, we show that noise can be added to group query results in a way that provides differential privacy for non-anomalous individuals and still enables efficient and accurate detection of the anomalous individuals. Our method ensures that using differentially private aggregation of groups of points, providing privacy to individuals within the group while refining the group selection to the point that we can probabilistically narrow attention to a small numbers of individuals or samples for further attention. To summarize: We introduce a new notion of anomaly-restriction differential privacy, which may be of independent interest. We provide a noisy group-based search algorithm that satisfies the anomaly-restricted differential privacy definition. We provide both theoretical and empirical analysis of our noisy search algorithm, showing that it performs well in some cases, and exhibits the usual privacy/accuracy tradeoff of differentially private mechanisms. Potential anomaly detection applications for our work might include spatial search for outliers: this would rely on new sensing technologies that can perform queries in aggregate to reveal and isolate anomalous outliers. For example, this could lead to privacy-sensitive methods for searching for outlying cell phone activity patterns or Internet activity patterns in a geographic location.

查看原文本刊更多论文

差分私有噪声搜索在异常检测中的应用(摘要)

我们考虑的问题是隐私敏感的异常检测-筛选检测个人，行为，区域，或高兴趣的数据样本。异常的定义是特定于上下文的;例如，一个被欺骗的而不是真正的用户试图登录一个网站，一个欺诈性的信用卡交易，或者一个可疑的旅客在机场。统一的假设是，异常点的数量相对于总体而言是相当小的，因此对所有单个数据点的深度筛选可能会耗费大量时间，成本高昂，并且不必要地侵犯隐私。由于使用数据的敏感性，这种隐私侵犯会引起人们的担忧，引发对违反数据使用协议的担忧，并使人们对异常检测方法感到不舒服。异常检测已经得到了很好的研究，但是提供异常检测和隐私的方法还没有得到很好的研究。我们在这项研究中的总体目标是提供一个框架来识别异常数据，同时保证严格意义上的可量化隐私。一旦确定，这些异常情况可能需要进一步的数据收集和调查，具体取决于具体情况和相关政策。在本研究中，我们重点关注异常检测部署过程中的隐私保护。我们的主要贡献是使用基于自适应噪声群测试的搜索算法来发现异常的差异私有访问机制。为了实现这一点，我们以群体测试的概念为出发点[1]，这是二战期间最著名的用于筛查美国军事征兵梅毒的概念。在群体检测中，个体被分组检测以限制检测的次数。通过多轮筛查，可以非常有效地发现少数阳性个体。群体测试还有一个额外的好处，那就是通过合理的推诿为个人提供隐私——因为群体测试使用的是汇总数据，个人对测试的贡献被群体掩盖了。我们通过展示一个利用聚合组数据的自适应查询的搜索模型来遵循这些概念。我们的工作通过实现差异隐私，为加强和形式化这些隐私概念迈出了第一步[2]。差异隐私是一种披露风险的统计度量，它抓住了这样一种直觉，即如果计算结果对个人数据的依赖程度至多非常小且可量化，则个人隐私受到保护。在过去的十年里，苹果、谷歌和优步等知名公司正在实际采用这种技术。为了使差分隐私在寻求具体识别某些(异常)个体的任务上下文中有意义，我们引入了异常限制差分隐私的概念。利用信息论的思想，我们证明了噪声可以以一种为非异常个体提供差异隐私的方式添加到组查询结果中，并且仍然能够有效和准确地检测异常个体。我们的方法确保使用点组的差异私有聚合，为组内的个人提供隐私，同时精炼组选择，使我们可以概率地将注意力集中在少数个人或样本上，以便进一步关注。总结:我们引入了一个新的异常限制差分隐私的概念，这可能是一个独立的兴趣。我们提供了一种满足异常限制差分隐私定义的基于噪声群的搜索算法。我们对我们的噪声搜索算法进行了理论和实证分析，表明它在某些情况下表现良好，并展示了差异私有机制通常的隐私/准确性权衡。我们工作的潜在异常检测应用可能包括对异常值的空间搜索:这将依赖于新的传感技术，这些技术可以执行查询，以揭示和隔离异常异常值。例如，这可能导致使用对隐私敏感的方法来搜索地理位置中的外围手机活动模式或Internet活动模式。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security

自引率

0.00%

发文量