Multi-Party Private Set Intersection in Vertical Federated Learning

2020 IEEE 19th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom) Pub Date : 2020-12-01 DOI:10.1109/TrustCom50675.2020.00098

Linpeng Lu, Ning Ding

{"title":"Multi-Party Private Set Intersection in Vertical Federated Learning","authors":"Linpeng Lu, Ning Ding","doi":"10.1109/TrustCom50675.2020.00098","DOIUrl":null,"url":null,"abstract":"Vertical federated learning (VFL) is a privacy-preserving machine learning framework in which the training dataset is vertically partitioned and distributed over multiple parties, i.e., for each sample each party only possesses some attributes of it. In this paper we address the problem of computing private set intersection (PSI) in VLF, in which a private set denotes the data possessed by a party satisfying some distinguishing constraint. This problem actually asks how the parties jointly compute the common IDs of their private sets, which plays a key role in many learning tasks such as Decision Tree Learning. Currently all known PSI protocols, to our knowledge, either involve expensive cryptographic operations, or are designed for the two-party scenario originally which will leak privacy-sensitive information in multi-party scenario if applied to each pair of parties gradually. In this paper we propose a new multi-party PSI protocol in VFL, which can even handle the case that some parties drop out in the running of the protocol. Our protocol achieves the security that any coalition of corrupted parties, which number is less than a threshold, cannot learn any secret information of honest parties, thus realizing the goal of preserving the privacy of the involved parties. Moreover, it only relies on light cryptographic primitives (i.e. PRGs) and thus works more efficiently compared to the known protocols, especially when the sample number of dataset gets larger and larger. Our starting point to solve the PSI problem in VFL is to reduce it to computing the AND operation of multiple bit-vectors, each held by one party, which are used to identify parties' private sets in their data. Then our main technical contribution is to present an efficient protocol for summing up these vectors, called MulSUM, and then adapt it to a desired protocol, called MulAND, to compute the AND of these vectors, which result actually identifies the intersection of private sets of all (online) parties, thus accomplishing the PSI issue.","PeriodicalId":221956,"journal":{"name":"2020 IEEE 19th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"18","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE 19th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/TrustCom50675.2020.00098","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 18

Abstract

Vertical federated learning (VFL) is a privacy-preserving machine learning framework in which the training dataset is vertically partitioned and distributed over multiple parties, i.e., for each sample each party only possesses some attributes of it. In this paper we address the problem of computing private set intersection (PSI) in VLF, in which a private set denotes the data possessed by a party satisfying some distinguishing constraint. This problem actually asks how the parties jointly compute the common IDs of their private sets, which plays a key role in many learning tasks such as Decision Tree Learning. Currently all known PSI protocols, to our knowledge, either involve expensive cryptographic operations, or are designed for the two-party scenario originally which will leak privacy-sensitive information in multi-party scenario if applied to each pair of parties gradually. In this paper we propose a new multi-party PSI protocol in VFL, which can even handle the case that some parties drop out in the running of the protocol. Our protocol achieves the security that any coalition of corrupted parties, which number is less than a threshold, cannot learn any secret information of honest parties, thus realizing the goal of preserving the privacy of the involved parties. Moreover, it only relies on light cryptographic primitives (i.e. PRGs) and thus works more efficiently compared to the known protocols, especially when the sample number of dataset gets larger and larger. Our starting point to solve the PSI problem in VFL is to reduce it to computing the AND operation of multiple bit-vectors, each held by one party, which are used to identify parties' private sets in their data. Then our main technical contribution is to present an efficient protocol for summing up these vectors, called MulSUM, and then adapt it to a desired protocol, called MulAND, to compute the AND of these vectors, which result actually identifies the intersection of private sets of all (online) parties, thus accomplishing the PSI issue.

查看原文本刊更多论文

垂直联邦学习中的多方私有集交集

垂直联邦学习(Vertical federated learning, VFL)是一种保护隐私的机器学习框架，它将训练数据集垂直划分并分布在多个参与方上，即对于每个样本，每个参与方只拥有它的一些属性。本文研究了VLF中私有集交集(PSI)的计算问题，其中私有集表示满足某些区分约束的一方所拥有的数据。这个问题实际上是问各方如何共同计算他们私有集合的公共id，这在许多学习任务中起着关键作用，如决策树学习。目前已知的所有PSI协议，据我们所知，要么涉及昂贵的加密操作，要么最初是为两方场景设计的，如果逐步应用于每对当事人，就会泄露多方场景中的隐私敏感信息。本文提出了一种新的VFL中的多方PSI协议，该协议甚至可以处理在协议运行过程中某些参与方退出的情况。我们的协议实现了任何少于一个阈值的腐败方联盟都无法获知诚实方的任何秘密信息的安全性，从而实现了保护相关方隐私的目的。此外，它只依赖于轻加密原语(即prg)，因此与已知协议相比，它的工作效率更高，特别是当数据集的样本数量越来越大时。我们解决VFL中PSI问题的出发点是将其简化为计算多个位向量的AND运算，每个位向量由一方持有，用于识别各方数据中的私有集。然后，我们的主要技术贡献是提出一个有效的协议来求和这些向量，称为MulSUM，然后将其适应于一个所需的协议，称为MulAND，来计算这些向量的与，其结果实际上识别所有(在线)方的私有集的交集，从而完成PSI问题。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2020 IEEE 19th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom)

自引率

0.00%

发文量