{"title":"基于高斯核的LSH高维相似性搜索","authors":"Masrat Rasool;Khelil Kassoul;Samir Brahim Belhaouari","doi":"10.1109/OJCS.2025.3602355","DOIUrl":null,"url":null,"abstract":"High-dimensional similarity search remains a critical challenge in machine learning, particularly when data lie on complex, non-linear manifolds that undermine the effectiveness of classical Locality-Sensitive Hashing (LSH). This work introduces Gaussian LSH, a kernel-based hashing framework that integrates over-clustering with Gaussian probability density modelling to improve locality preservation while maintaining computational efficiency. The method generates compact binary codes from a hybrid kernel–PDF score and supports scalable GPU-accelerated indexing for large datasets. Empirical evaluations across multiple visual and textual benchmarks demonstrate consistent improvements in recall and query latency compared to representative LSH variants and approximate nearest neighbour libraries. Gaussian LSH achieves recall gains of up to <inline-formula><tex-math>$\\text{9}\\,\\text{pp}$</tex-math></inline-formula> and latency reductions of up to <inline-formula><tex-math>$4.3\\times$</tex-math></inline-formula>, with benefits sustained across a range of code lengths. These results highlight the approach’s scalability and accuracy, supporting its use in medium- to large-scale similarity retrieval tasks across diverse data domains.","PeriodicalId":13205,"journal":{"name":"IEEE Open Journal of the Computer Society","volume":"6 ","pages":"1402-1413"},"PeriodicalIF":0.0000,"publicationDate":"2025-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11134766","citationCount":"0","resultStr":"{\"title\":\"Gaussian Kernel-Based LSH for High-Dimensional Similarity Search\",\"authors\":\"Masrat Rasool;Khelil Kassoul;Samir Brahim Belhaouari\",\"doi\":\"10.1109/OJCS.2025.3602355\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"High-dimensional similarity search remains a critical challenge in machine learning, particularly when data lie on complex, non-linear manifolds that undermine the effectiveness of classical Locality-Sensitive Hashing (LSH). This work introduces Gaussian LSH, a kernel-based hashing framework that integrates over-clustering with Gaussian probability density modelling to improve locality preservation while maintaining computational efficiency. The method generates compact binary codes from a hybrid kernel–PDF score and supports scalable GPU-accelerated indexing for large datasets. Empirical evaluations across multiple visual and textual benchmarks demonstrate consistent improvements in recall and query latency compared to representative LSH variants and approximate nearest neighbour libraries. Gaussian LSH achieves recall gains of up to <inline-formula><tex-math>$\\\\text{9}\\\\,\\\\text{pp}$</tex-math></inline-formula> and latency reductions of up to <inline-formula><tex-math>$4.3\\\\times$</tex-math></inline-formula>, with benefits sustained across a range of code lengths. These results highlight the approach’s scalability and accuracy, supporting its use in medium- to large-scale similarity retrieval tasks across diverse data domains.\",\"PeriodicalId\":13205,\"journal\":{\"name\":\"IEEE Open Journal of the Computer Society\",\"volume\":\"6 \",\"pages\":\"1402-1413\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2025-08-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11134766\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Open Journal of the Computer Society\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/11134766/\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Open Journal of the Computer Society","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/11134766/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Gaussian Kernel-Based LSH for High-Dimensional Similarity Search
High-dimensional similarity search remains a critical challenge in machine learning, particularly when data lie on complex, non-linear manifolds that undermine the effectiveness of classical Locality-Sensitive Hashing (LSH). This work introduces Gaussian LSH, a kernel-based hashing framework that integrates over-clustering with Gaussian probability density modelling to improve locality preservation while maintaining computational efficiency. The method generates compact binary codes from a hybrid kernel–PDF score and supports scalable GPU-accelerated indexing for large datasets. Empirical evaluations across multiple visual and textual benchmarks demonstrate consistent improvements in recall and query latency compared to representative LSH variants and approximate nearest neighbour libraries. Gaussian LSH achieves recall gains of up to $\text{9}\,\text{pp}$ and latency reductions of up to $4.3\times$, with benefits sustained across a range of code lengths. These results highlight the approach’s scalability and accuracy, supporting its use in medium- to large-scale similarity retrieval tasks across diverse data domains.