用空箱双向致密化LSH草图

Peng Jia, Pinghui Wang, Junzhou Zhao, Shuo Zhang, Yiyan Qi, Min Hu, Chao Deng, X. Guan
{"title":"用空箱双向致密化LSH草图","authors":"Peng Jia, Pinghui Wang, Junzhou Zhao, Shuo Zhang, Yiyan Qi, Min Hu, Chao Deng, X. Guan","doi":"10.1145/3448016.3452833","DOIUrl":null,"url":null,"abstract":"As an efficient tool for approximate similarity computation and search, Locality Sensitive Hashing (LSH) has been widely used in many research areas including databases, data mining, information retrieval, and machine learning. Classical LSH methods typically require to perform hundreds or even thousands of hashing operations when computing the LSH sketch for each input item (e.g., a set or a vector); however, this complexity is still too expensive and even impractical for applications requiring processing data in real-time. To address this issue, several fast methods such as OPH and BCWS have been proposed to efficiently compute the LSH sketches; however, these methods may generate many sketches with empty bins, which may introduce large errors for similarity estimation and also limit their usage for fast similarity search. To solve this issue, we propose a novel densification method, i.e., BiDens. Compared with existing densification methods, our BiDens is more efficient to fill a sketch's empty bins with values of its non-empty bins in either the forward or backward directions. Furthermore, it also densifies empty bins to satisfy the densification principle (i.e., the LSH property). Theoretical analysis and experimental results on similarity estimation, fast similarity search, and kernel linearization using real-world datasets demonstrate that our BiDens is up to 106 times faster than state-of-the-art methods while achieving the same or even better accuracy.","PeriodicalId":360379,"journal":{"name":"Proceedings of the 2021 International Conference on Management of Data","volume":"9 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-06-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":"{\"title\":\"Bidirectionally Densifying LSH Sketches with Empty Bins\",\"authors\":\"Peng Jia, Pinghui Wang, Junzhou Zhao, Shuo Zhang, Yiyan Qi, Min Hu, Chao Deng, X. Guan\",\"doi\":\"10.1145/3448016.3452833\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"As an efficient tool for approximate similarity computation and search, Locality Sensitive Hashing (LSH) has been widely used in many research areas including databases, data mining, information retrieval, and machine learning. Classical LSH methods typically require to perform hundreds or even thousands of hashing operations when computing the LSH sketch for each input item (e.g., a set or a vector); however, this complexity is still too expensive and even impractical for applications requiring processing data in real-time. To address this issue, several fast methods such as OPH and BCWS have been proposed to efficiently compute the LSH sketches; however, these methods may generate many sketches with empty bins, which may introduce large errors for similarity estimation and also limit their usage for fast similarity search. To solve this issue, we propose a novel densification method, i.e., BiDens. Compared with existing densification methods, our BiDens is more efficient to fill a sketch's empty bins with values of its non-empty bins in either the forward or backward directions. Furthermore, it also densifies empty bins to satisfy the densification principle (i.e., the LSH property). Theoretical analysis and experimental results on similarity estimation, fast similarity search, and kernel linearization using real-world datasets demonstrate that our BiDens is up to 106 times faster than state-of-the-art methods while achieving the same or even better accuracy.\",\"PeriodicalId\":360379,\"journal\":{\"name\":\"Proceedings of the 2021 International Conference on Management of Data\",\"volume\":\"9 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-06-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"8\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2021 International Conference on Management of Data\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3448016.3452833\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2021 International Conference on Management of Data","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3448016.3452833","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 8

摘要

局部敏感哈希作为一种高效的近似相似度计算和搜索工具,在数据库、数据挖掘、信息检索和机器学习等研究领域得到了广泛的应用。当为每个输入项(例如,一个集合或一个向量)计算LSH草图时,经典的LSH方法通常需要执行数百甚至数千个哈希操作;然而,对于需要实时处理数据的应用程序来说,这种复杂性仍然过于昂贵,甚至不切实际。为了解决这一问题,人们提出了几种快速的方法,如OPH和BCWS,以有效地计算LSH草图;然而,这些方法可能会生成许多带有空箱子的草图,这可能会给相似度估计带来很大的误差,也限制了它们在快速相似度搜索中的应用。为了解决这一问题,我们提出了一种新的致密化方法,即BiDens。与现有的致密化方法相比,我们的BiDens在正向或反向方向上用草图的非空桶的值填充草图的空桶时效率更高。此外,它还对空仓进行致密化,以满足致密化原理(即LSH特性)。在相似度估计、快速相似度搜索和核线性化方面的理论分析和实验结果表明,我们的BiDens比最先进的方法快106倍,同时达到相同甚至更好的精度。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Bidirectionally Densifying LSH Sketches with Empty Bins
As an efficient tool for approximate similarity computation and search, Locality Sensitive Hashing (LSH) has been widely used in many research areas including databases, data mining, information retrieval, and machine learning. Classical LSH methods typically require to perform hundreds or even thousands of hashing operations when computing the LSH sketch for each input item (e.g., a set or a vector); however, this complexity is still too expensive and even impractical for applications requiring processing data in real-time. To address this issue, several fast methods such as OPH and BCWS have been proposed to efficiently compute the LSH sketches; however, these methods may generate many sketches with empty bins, which may introduce large errors for similarity estimation and also limit their usage for fast similarity search. To solve this issue, we propose a novel densification method, i.e., BiDens. Compared with existing densification methods, our BiDens is more efficient to fill a sketch's empty bins with values of its non-empty bins in either the forward or backward directions. Furthermore, it also densifies empty bins to satisfy the densification principle (i.e., the LSH property). Theoretical analysis and experimental results on similarity estimation, fast similarity search, and kernel linearization using real-world datasets demonstrate that our BiDens is up to 106 times faster than state-of-the-art methods while achieving the same or even better accuracy.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信