Fast, Private and Verifiable: Server-aided Approximate Similarity Computation over Large-Scale Datasets

Shuo Qiu, Boyang Wang, Ming Li, Jesse Victors, Jiqiang Liu, Yanfeng Shi, Wei Wang
{"title":"Fast, Private and Verifiable: Server-aided Approximate Similarity Computation over Large-Scale Datasets","authors":"Shuo Qiu, Boyang Wang, Ming Li, Jesse Victors, Jiqiang Liu, Yanfeng Shi, Wei Wang","doi":"10.1145/2898445.2898453","DOIUrl":null,"url":null,"abstract":"Computing similarity, especially Jaccard Similarity, between two datasets is a fundamental building block in big data analytics, and extensive applications including genome matching, plagiarism detection, social networking, etc. The increasing user privacy concerns over the release of has sensitive data have made it desirable and necessary for two users to evaluate Jaccard Similarity over their datasets in a privacy-preserving manner. In this paper, we propose two efficient and secure protocols to compute the Jaccard Similarity of two users' private sets with the help of an unfully-trusted server. Specifically, in order to boost the efficiency, we leverage Minhashing algorithm on encrypted data, where the output of our protocols is guaranteed to be a close approximation of the exact value. In both protocols, only an approximate similarity result is leaked to the server and users. The first protocol is secure against a semi-honest server, while the second protocol, with a novel consistency-check mechanism, further achieves result verifiability against a malicious server who cheats in the executions. Experimental results show that our first protocol computes an approximate Jaccard Similarity of two billion-element sets within only 6 minutes (under 256-bit security in parallel mode). To the best of our knowledge, our consistency-check mechanism represents the very first work to realize an efficient verification particularly on approximate similarity computation.","PeriodicalId":187535,"journal":{"name":"Proceedings of the 4th ACM International Workshop on Security in Cloud Computing","volume":"14 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 4th ACM International Workshop on Security in Cloud Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2898445.2898453","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

Abstract

Computing similarity, especially Jaccard Similarity, between two datasets is a fundamental building block in big data analytics, and extensive applications including genome matching, plagiarism detection, social networking, etc. The increasing user privacy concerns over the release of has sensitive data have made it desirable and necessary for two users to evaluate Jaccard Similarity over their datasets in a privacy-preserving manner. In this paper, we propose two efficient and secure protocols to compute the Jaccard Similarity of two users' private sets with the help of an unfully-trusted server. Specifically, in order to boost the efficiency, we leverage Minhashing algorithm on encrypted data, where the output of our protocols is guaranteed to be a close approximation of the exact value. In both protocols, only an approximate similarity result is leaked to the server and users. The first protocol is secure against a semi-honest server, while the second protocol, with a novel consistency-check mechanism, further achieves result verifiability against a malicious server who cheats in the executions. Experimental results show that our first protocol computes an approximate Jaccard Similarity of two billion-element sets within only 6 minutes (under 256-bit security in parallel mode). To the best of our knowledge, our consistency-check mechanism represents the very first work to realize an efficient verification particularly on approximate similarity computation.
快速,私有和可验证:大规模数据集的服务器辅助近似相似性计算
计算两个数据集之间的相似度,特别是Jaccard相似度是大数据分析的基本组成部分,在基因组匹配、抄袭检测、社交网络等领域有着广泛的应用。越来越多的用户对敏感数据发布的隐私担忧使得两个用户以隐私保护的方式评估他们数据集的Jaccard相似性是可取的和必要的。本文在不完全信任服务器的帮助下,提出了两种高效且安全的协议来计算两个用户私有集的Jaccard相似性。具体来说,为了提高效率,我们在加密数据上使用了散列算法,我们的协议的输出保证是精确值的接近值。在这两种协议中,只有一个近似的相似结果被泄露给服务器和用户。第一个协议对于半诚实的服务器是安全的,而第二个协议具有新颖的一致性检查机制,进一步实现了针对在执行中作弊的恶意服务器的结果可验证性。实验结果表明,我们的第一个协议仅在6分钟内(在并行模式下256位安全性下)计算了20亿元素集的近似Jaccard相似性。据我们所知,我们的一致性检查机制代表了第一个实现有效验证的工作,特别是在近似相似性计算上。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信