Privacy-Preserving Membership Queries for Federated Anomaly Detection

Proceedings on Privacy Enhancing Technologies Pub Date : 2024-07-01 DOI:10.56553/popets-2024-0074

Jelle Vos, Sikha Pentyala, Steven Golob, Ricardo Maia, Dean Kelley, Z. Erkin, Martine De Cock, Anderson Nascimento

{"title":"Privacy-Preserving Membership Queries for Federated Anomaly Detection","authors":"Jelle Vos, Sikha Pentyala, Steven Golob, Ricardo Maia, Dean Kelley, Z. Erkin, Martine De Cock, Anderson Nascimento","doi":"10.56553/popets-2024-0074","DOIUrl":null,"url":null,"abstract":"In this work, we propose a new privacy-preserving membership query protocol that lets a centralized entity privately query datasets held by one or more other parties to check if they contain a given element. This protocol, based on elliptic curve-based ElGamal and oblivious key-value stores, ensures that those 'data-augmenting' parties only have to send their encrypted data to the centralized entity once, making the protocol particularly efficient when the centralized entity repeatedly queries the same sets of data. We apply this protocol to detect anomalies in cross-silo federations. Data anomalies across such cross-silo federations are challenging to detect because (1) the centralized entities have little knowledge of the actual users, (2) the data-augmenting entities do not have a global view of the system, and (3) privacy concerns and regulations prevent pooling all the data. Our protocol allows for anomaly detection even in strongly separated distributed systems while protecting users' privacy. Specifically, we propose a cross-silo federated architecture in which a centralized entity (the backbone) has labeled data to train a machine learning model for detecting anomalous instances. The other entities in the federation are data-augmenting clients (the user-facing entities) who collaborate with the centralized entity to extract feature values to improve the utility of the model. These feature values are computed using our privacy-preserving membership query protocol. The model can be trained with an off-the-shelf machine learning algorithm that provides differential privacy to prevent it from memorizing instances from the training data, thereby providing output privacy. However, it is not straightforward to also efficiently provide input privacy, which ensures that none of the entities in the federation ever see the data of other entities in an unencrypted form. We demonstrate the effectiveness of our approach in the financial domain, motivated by the PETs Prize Challenge, which is a collaborative effort between the US and UK governments to combat international fraudulent transactions. We show that the private queries significantly increase the precision and recall of the otherwise centralized system and argue that this improvement translates to other use cases as well.","PeriodicalId":519525,"journal":{"name":"Proceedings on Privacy Enhancing Technologies","volume":"89 2","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings on Privacy Enhancing Technologies","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.56553/popets-2024-0074","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

In this work, we propose a new privacy-preserving membership query protocol that lets a centralized entity privately query datasets held by one or more other parties to check if they contain a given element. This protocol, based on elliptic curve-based ElGamal and oblivious key-value stores, ensures that those 'data-augmenting' parties only have to send their encrypted data to the centralized entity once, making the protocol particularly efficient when the centralized entity repeatedly queries the same sets of data. We apply this protocol to detect anomalies in cross-silo federations. Data anomalies across such cross-silo federations are challenging to detect because (1) the centralized entities have little knowledge of the actual users, (2) the data-augmenting entities do not have a global view of the system, and (3) privacy concerns and regulations prevent pooling all the data. Our protocol allows for anomaly detection even in strongly separated distributed systems while protecting users' privacy. Specifically, we propose a cross-silo federated architecture in which a centralized entity (the backbone) has labeled data to train a machine learning model for detecting anomalous instances. The other entities in the federation are data-augmenting clients (the user-facing entities) who collaborate with the centralized entity to extract feature values to improve the utility of the model. These feature values are computed using our privacy-preserving membership query protocol. The model can be trained with an off-the-shelf machine learning algorithm that provides differential privacy to prevent it from memorizing instances from the training data, thereby providing output privacy. However, it is not straightforward to also efficiently provide input privacy, which ensures that none of the entities in the federation ever see the data of other entities in an unencrypted form. We demonstrate the effectiveness of our approach in the financial domain, motivated by the PETs Prize Challenge, which is a collaborative effort between the US and UK governments to combat international fraudulent transactions. We show that the private queries significantly increase the precision and recall of the otherwise centralized system and argue that this improvement translates to other use cases as well.

查看原文本刊更多论文

用于联合异常检测的隐私保护成员资格查询

在这项工作中，我们提出了一种新的隐私保护成员查询协议，它允许一个集中实体私下查询一个或多个其他方持有的数据集，以检查它们是否包含给定元素。该协议基于基于椭圆曲线的 ElGamal 和遗忘键值存储，可确保 "数据增量 "方只需向集中实体发送一次加密数据，因此当集中实体重复查询相同的数据集时，该协议特别高效。我们应用该协议来检测跨单机联盟中的异常情况。检测此类跨单机联盟的数据异常具有挑战性，因为：（1）集中实体对实际用户知之甚少；（2）数据增强实体没有系统的全局视图；（3）隐私问题和法规阻碍了所有数据的汇集。我们的协议即使在高度分离的分布式系统中也能进行异常检测，同时保护用户隐私。具体来说，我们提出了一种跨分片联盟架构，其中一个中心实体（主干）拥有标记数据，用于训练机器学习模型以检测异常实例。联盟中的其他实体是数据增强客户端（面向用户的实体），它们与中央实体合作提取特征值，以提高模型的实用性。这些特征值是通过我们的隐私保护成员查询协议计算得出的。该模型可以使用现成的机器学习算法进行训练，该算法提供差异化隐私保护，以防止其记忆训练数据中的实例，从而提供输出隐私。然而，要同时有效地提供输入隐私，确保联盟中的任何实体都不会以未加密的形式看到其他实体的数据，并不是一件简单的事。我们在金融领域展示了我们的方法的有效性，其动机是美国和英国政府合作打击国际欺诈交易的 PETs Prize Challenge。我们表明，私人查询大大提高了集中式系统的精确度和召回率，并认为这种改进也适用于其他用例。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings on Privacy Enhancing Technologies

自引率

0.00%

发文量