Locality-Sensitive Hashing Scheme Based on Heap Sort of Hash Bucket

2019 14th International Conference on Computer Science & Education (ICCSE) Pub Date : 2019-08-01 DOI:10.1109/ICCSE.2019.8845438

Bo Fang, Zhongyun Hua, Hejiao Huang

{"title":"Locality-Sensitive Hashing Scheme Based on Heap Sort of Hash Bucket","authors":"Bo Fang, Zhongyun Hua, Hejiao Huang","doi":"10.1109/ICCSE.2019.8845438","DOIUrl":null,"url":null,"abstract":"Nearest neighbor search (NNS) is one of the current popular research directions, which widely used in machine learning, pattern recognition, image detection and so on. In the low dimension data, based on tree search method can get good results. But when the data dimension goes up, that will produce a curse of dimensional. The proposed Locality-Sensitive Hashing algorithm (LSH) greatly improves the efficiency of nearest neighbor query for high dimensional data. But the algorithm relies on the building a large number of hash table, which makes the space complexity very high. C2LSH based on dynamic collision improves the disadvantage of LSH, but its disadvantage is that it needs to detect the collision times of a large number of data points which Increased query time. Therefore, Based on LSH algorithm, later researchers put forward many improved algorithms, but still not ideal.In this paper, we put forward Locality-Sensitive Hashing Scheme Based on Heap Sort of Hash Bucket (HSLSH) algorithm aiming at the shortcomings of LSH and C2LSH. Its main idea is to take advantage of the efficiency of heapsort in massive data sorting to improve the efficiency of nearest neighbor query. It only needs to rely on a small number of hash functions can not only overcome the shortcoming of LSH need to build a large number of hash table, and avoids defects of C2LSH. Experiments show that our algorithm is more than 20% better than C2LSH in query accuracy and 40% percent lower in query time.","PeriodicalId":351346,"journal":{"name":"2019 14th International Conference on Computer Science & Education (ICCSE)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 14th International Conference on Computer Science & Education (ICCSE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCSE.2019.8845438","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

Nearest neighbor search (NNS) is one of the current popular research directions, which widely used in machine learning, pattern recognition, image detection and so on. In the low dimension data, based on tree search method can get good results. But when the data dimension goes up, that will produce a curse of dimensional. The proposed Locality-Sensitive Hashing algorithm (LSH) greatly improves the efficiency of nearest neighbor query for high dimensional data. But the algorithm relies on the building a large number of hash table, which makes the space complexity very high. C2LSH based on dynamic collision improves the disadvantage of LSH, but its disadvantage is that it needs to detect the collision times of a large number of data points which Increased query time. Therefore, Based on LSH algorithm, later researchers put forward many improved algorithms, but still not ideal.In this paper, we put forward Locality-Sensitive Hashing Scheme Based on Heap Sort of Hash Bucket (HSLSH) algorithm aiming at the shortcomings of LSH and C2LSH. Its main idea is to take advantage of the efficiency of heapsort in massive data sorting to improve the efficiency of nearest neighbor query. It only needs to rely on a small number of hash functions can not only overcome the shortcoming of LSH need to build a large number of hash table, and avoids defects of C2LSH. Experiments show that our algorithm is more than 20% better than C2LSH in query accuracy and 40% percent lower in query time.

查看原文本刊更多论文

基于哈希桶堆排序的位置敏感哈希方案

最近邻搜索(NNS)是当前热门的研究方向之一，广泛应用于机器学习、模式识别、图像检测等领域。在低维数据中，基于树的搜索方法可以得到很好的结果。但是当数据维度上升时，就会产生维度诅咒。提出的位置敏感哈希算法(LSH)极大地提高了高维数据的最近邻查询效率。但是该算法依赖于建立大量的哈希表，使得空间复杂度非常高。基于动态碰撞的C2LSH改进了LSH的缺点，但其缺点是需要检测大量数据点的碰撞次数，增加了查询时间。因此，后来的研究人员在LSH算法的基础上提出了许多改进的算法，但仍然不理想。本文针对LSH和C2LSH算法的不足，提出了基于Hash Bucket堆排序的位置敏感散列方案(HSLSH)。其主要思想是利用堆排序在海量数据排序中的效率来提高最近邻查询的效率。它只需要依靠少量的哈希函数，既克服了LSH需要构建大量哈希表的缺点，又避免了C2LSH的缺陷。实验表明，该算法的查询准确率比C2LSH提高20%以上，查询时间比C2LSH降低40%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2019 14th International Conference on Computer Science & Education (ICCSE)

自引率

0.00%

发文量