Cryptographically Secure Private Record Linkage Using Locality-Sensitive Hashing

Proc. VLDB Endow. Pub Date : 2023-10-01 DOI:10.14778/3626292.3626293

Ruidi Wei, F. Kerschbaum

{"title":"Cryptographically Secure Private Record Linkage Using Locality-Sensitive Hashing","authors":"Ruidi Wei, F. Kerschbaum","doi":"10.14778/3626292.3626293","DOIUrl":null,"url":null,"abstract":"Private record linkage (PRL) is the problem of identifying pairs of records that approximately match across datasets in a secure, privacy-preserving manner. Two-party PRL specifically allows each of the parties to obtain records from the other party, only given that each record matches with one of their own. The privacy goal is that no other information about the datasets should be released than the matching records. A fundamental challenge is not to leak information while at the same time not comparing all pairs of records. In plaintext record linkage this is done using a blocking strategy, e.g., locality-sensitive hashing. One recent approach proposed by He et al. (ACM CCS 2017) uses locality-sensitive hashing and then releases a provably differential private representation of the hash bins. However, differential privacy still leaks some, although provable bounded information and does not protect against attacks, such as property inference attacks. Another recent approach by Khurram and Kerschbaum (IEEE ICDE 2020) uses locality-preserving hashing and provides cryptographic security, i.e., it releases no information except the output. However, locality-preserving hash functions are much harder to construct than locality-sensitive hash functions and hence accuracy of this approach is limited, particularly on larger datasets. In this paper, we address the open problem of providing cryptographic security of PRL while using locality-sensitive hash functions. Using recent results in oblivious algorithms, we design a new cryptographically secure PRL with locality-sensitive hash functions. Our prototypical implementation can match 40000 records in the British National Library/Toronto Public Library and the North Carolina Voter Registry datasets with 99.3% and 99.9% accuracy, respectively, in less than an hour which is more than an order of magnitude faster than Khurram and Kerschbaum's work at a higher accuracy.","PeriodicalId":20467,"journal":{"name":"Proc. VLDB Endow.","volume":"84 1","pages":"79-91"},"PeriodicalIF":0.0000,"publicationDate":"2023-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proc. VLDB Endow.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.14778/3626292.3626293","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Private record linkage (PRL) is the problem of identifying pairs of records that approximately match across datasets in a secure, privacy-preserving manner. Two-party PRL specifically allows each of the parties to obtain records from the other party, only given that each record matches with one of their own. The privacy goal is that no other information about the datasets should be released than the matching records. A fundamental challenge is not to leak information while at the same time not comparing all pairs of records. In plaintext record linkage this is done using a blocking strategy, e.g., locality-sensitive hashing. One recent approach proposed by He et al. (ACM CCS 2017) uses locality-sensitive hashing and then releases a provably differential private representation of the hash bins. However, differential privacy still leaks some, although provable bounded information and does not protect against attacks, such as property inference attacks. Another recent approach by Khurram and Kerschbaum (IEEE ICDE 2020) uses locality-preserving hashing and provides cryptographic security, i.e., it releases no information except the output. However, locality-preserving hash functions are much harder to construct than locality-sensitive hash functions and hence accuracy of this approach is limited, particularly on larger datasets. In this paper, we address the open problem of providing cryptographic security of PRL while using locality-sensitive hash functions. Using recent results in oblivious algorithms, we design a new cryptographically secure PRL with locality-sensitive hash functions. Our prototypical implementation can match 40000 records in the British National Library/Toronto Public Library and the North Carolina Voter Registry datasets with 99.3% and 99.9% accuracy, respectively, in less than an hour which is more than an order of magnitude faster than Khurram and Kerschbaum's work at a higher accuracy.

查看原文本刊更多论文

利用位置敏感哈希算法实现加密安全的私人记录链接

私人记录链接（PRL）是指以安全、保护隐私的方式识别数据集之间大致匹配的记录对的问题。双方 PRL 特别允许每一方从另一方获取记录，但前提是每条记录都与自己的记录相匹配。隐私保护的目标是，除了匹配记录外，不得泄露数据集的其他信息。一个基本的挑战是在不比较所有记录对的同时不泄露信息。在明文记录链接中，可以使用阻塞策略（如位置敏感哈希算法）做到这一点。He 等人最近提出的一种方法（ACM CCS 2017）使用了对位置敏感的哈希算法，然后发布了哈希分仓的可证明差分隐私表示。然而，差分隐私仍然会泄露一些可证明的有界信息，而且无法抵御攻击，如属性推理攻击。Khurram 和 Kerschbaum 最近提出的另一种方法（IEEE ICDE 2020）使用了局部性保护散列，并提供了加密安全性，即除了输出外不会泄露任何信息。然而，位置保持散列函数比位置敏感散列函数更难构建，因此这种方法的准确性有限，尤其是在较大的数据集上。在本文中，我们要解决的问题是，在使用位置敏感散列函数的同时提供 PRL 的加密安全性。利用最近在遗忘算法方面取得的成果，我们设计了一种新的加密安全 PRL，同时使用对位置敏感的散列函数。我们的原型实现可以在不到一个小时的时间内分别以 99.3% 和 99.9% 的准确率匹配英国国家图书馆/多伦多公共图书馆和北卡罗莱纳州选民登记数据集中的 40000 条记录，这比 Khurram 和 Kerschbaum 在更高准确率下的工作要快一个数量级以上。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proc. VLDB Endow.

自引率

0.00%

发文量