Fast, Private and Verifiable: Server-aided Approximate Similarity Computation over Large-Scale Datasets

Proceedings of the 4th ACM International Workshop on Security in Cloud Computing Pub Date : 2016-05-30 DOI:10.1145/2898445.2898453

Shuo Qiu, Boyang Wang, Ming Li, Jesse Victors, Jiqiang Liu, Yanfeng Shi, Wei Wang

{"title":"Fast, Private and Verifiable: Server-aided Approximate Similarity Computation over Large-Scale Datasets","authors":"Shuo Qiu, Boyang Wang, Ming Li, Jesse Victors, Jiqiang Liu, Yanfeng Shi, Wei Wang","doi":"10.1145/2898445.2898453","DOIUrl":null,"url":null,"abstract":"Computing similarity, especially Jaccard Similarity, between two datasets is a fundamental building block in big data analytics, and extensive applications including genome matching, plagiarism detection, social networking, etc. The increasing user privacy concerns over the release of has sensitive data have made it desirable and necessary for two users to evaluate Jaccard Similarity over their datasets in a privacy-preserving manner. In this paper, we propose two efficient and secure protocols to compute the Jaccard Similarity of two users' private sets with the help of an unfully-trusted server. Specifically, in order to boost the efficiency, we leverage Minhashing algorithm on encrypted data, where the output of our protocols is guaranteed to be a close approximation of the exact value. In both protocols, only an approximate similarity result is leaked to the server and users. The first protocol is secure against a semi-honest server, while the second protocol, with a novel consistency-check mechanism, further achieves result verifiability against a malicious server who cheats in the executions. Experimental results show that our first protocol computes an approximate Jaccard Similarity of two billion-element sets within only 6 minutes (under 256-bit security in parallel mode). To the best of our knowledge, our consistency-check mechanism represents the very first work to realize an efficient verification particularly on approximate similarity computation.","PeriodicalId":187535,"journal":{"name":"Proceedings of the 4th ACM International Workshop on Security in Cloud Computing","volume":"14 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 4th ACM International Workshop on Security in Cloud Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2898445.2898453","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

Abstract

Computing similarity, especially Jaccard Similarity, between two datasets is a fundamental building block in big data analytics, and extensive applications including genome matching, plagiarism detection, social networking, etc. The increasing user privacy concerns over the release of has sensitive data have made it desirable and necessary for two users to evaluate Jaccard Similarity over their datasets in a privacy-preserving manner. In this paper, we propose two efficient and secure protocols to compute the Jaccard Similarity of two users' private sets with the help of an unfully-trusted server. Specifically, in order to boost the efficiency, we leverage Minhashing algorithm on encrypted data, where the output of our protocols is guaranteed to be a close approximation of the exact value. In both protocols, only an approximate similarity result is leaked to the server and users. The first protocol is secure against a semi-honest server, while the second protocol, with a novel consistency-check mechanism, further achieves result verifiability against a malicious server who cheats in the executions. Experimental results show that our first protocol computes an approximate Jaccard Similarity of two billion-element sets within only 6 minutes (under 256-bit security in parallel mode). To the best of our knowledge, our consistency-check mechanism represents the very first work to realize an efficient verification particularly on approximate similarity computation.

查看原文本刊更多论文

快速，私有和可验证:大规模数据集的服务器辅助近似相似性计算

计算两个数据集之间的相似度，特别是Jaccard相似度是大数据分析的基本组成部分，在基因组匹配、抄袭检测、社交网络等领域有着广泛的应用。越来越多的用户对敏感数据发布的隐私担忧使得两个用户以隐私保护的方式评估他们数据集的Jaccard相似性是可取的和必要的。本文在不完全信任服务器的帮助下，提出了两种高效且安全的协议来计算两个用户私有集的Jaccard相似性。具体来说，为了提高效率，我们在加密数据上使用了散列算法，我们的协议的输出保证是精确值的接近值。在这两种协议中，只有一个近似的相似结果被泄露给服务器和用户。第一个协议对于半诚实的服务器是安全的，而第二个协议具有新颖的一致性检查机制，进一步实现了针对在执行中作弊的恶意服务器的结果可验证性。实验结果表明，我们的第一个协议仅在6分钟内(在并行模式下256位安全性下)计算了20亿元素集的近似Jaccard相似性。据我们所知，我们的一致性检查机制代表了第一个实现有效验证的工作，特别是在近似相似性计算上。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 4th ACM International Workshop on Security in Cloud Computing

自引率

0.00%

发文量