PQBF:基于积量化的I/ o高效近似近邻搜索

Yingfan Liu, Hong Cheng, Jiangtao Cui
{"title":"PQBF:基于积量化的I/ o高效近似近邻搜索","authors":"Yingfan Liu, Hong Cheng, Jiangtao Cui","doi":"10.1145/3132847.3132901","DOIUrl":null,"url":null,"abstract":"Approximate nearest neighbor (ANN) search in high-dimensional space plays an essential role in many multimedia applications. Recently, product quantization (PQ) based methods for ANN search have attracted enormous attention in the community of computer vision, due to its good balance between accuracy and space requirement. PQ based methods embed a high-dimensional vector into a short binary code (called PQ code), and the squared Euclidean distance is estimated by asymmetric quantizer distance (AQD) with pretty high precision. Thus, ANN search in the original space can be converted to similarity search on AQD using the PQ approach. All existing PQ methods are in-memory solutions, which may not handle massive data if they cannot fit entirely in memory. In this paper, we propose an I/O-efficient PQ based solution for ANN search. We design an index called PQB+-forest to support efficient similarity search on AQD. PQB+-forest first creates a number of partitions of the PQ codes by a coarse quantizer and then builds a B+-tree, called PQB+-tree, for each partition. The search process is greatly expedited by focusing on a few selected partitions that are closest to the query, as well as by the pruning power of PQB+-trees. According to the experiments conducted on two large-scale data sets containing up to 1 billion vectors, our method outperforms its competitors, including the state-of-the-art PQ method and the state-of-the-art LSH methods for ANN search.","PeriodicalId":20449,"journal":{"name":"Proceedings of the 2017 ACM on Conference on Information and Knowledge Management","volume":"70 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2017-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":"{\"title\":\"PQBF: I/O-Efficient Approximate Nearest Neighbor Search by Product Quantization\",\"authors\":\"Yingfan Liu, Hong Cheng, Jiangtao Cui\",\"doi\":\"10.1145/3132847.3132901\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Approximate nearest neighbor (ANN) search in high-dimensional space plays an essential role in many multimedia applications. Recently, product quantization (PQ) based methods for ANN search have attracted enormous attention in the community of computer vision, due to its good balance between accuracy and space requirement. PQ based methods embed a high-dimensional vector into a short binary code (called PQ code), and the squared Euclidean distance is estimated by asymmetric quantizer distance (AQD) with pretty high precision. Thus, ANN search in the original space can be converted to similarity search on AQD using the PQ approach. All existing PQ methods are in-memory solutions, which may not handle massive data if they cannot fit entirely in memory. In this paper, we propose an I/O-efficient PQ based solution for ANN search. We design an index called PQB+-forest to support efficient similarity search on AQD. PQB+-forest first creates a number of partitions of the PQ codes by a coarse quantizer and then builds a B+-tree, called PQB+-tree, for each partition. The search process is greatly expedited by focusing on a few selected partitions that are closest to the query, as well as by the pruning power of PQB+-trees. According to the experiments conducted on two large-scale data sets containing up to 1 billion vectors, our method outperforms its competitors, including the state-of-the-art PQ method and the state-of-the-art LSH methods for ANN search.\",\"PeriodicalId\":20449,\"journal\":{\"name\":\"Proceedings of the 2017 ACM on Conference on Information and Knowledge Management\",\"volume\":\"70 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-11-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"12\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2017 ACM on Conference on Information and Knowledge Management\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3132847.3132901\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2017 ACM on Conference on Information and Knowledge Management","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3132847.3132901","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 12

摘要

高维空间的近似近邻搜索在许多多媒体应用中起着至关重要的作用。近年来,基于产品量化(PQ)的人工神经网络搜索方法因其在准确率和空间要求之间取得了良好的平衡而受到了计算机视觉界的广泛关注。基于PQ的方法将高维向量嵌入到短二进制码(称为PQ码)中,并通过非对称量化距离(AQD)估计欧几里得距离的平方,具有很高的精度。因此,使用PQ方法,可以将原始空间中的ANN搜索转换为AQD上的相似性搜索。所有现有的PQ方法都是内存解决方案,如果它们不能完全容纳在内存中,则可能无法处理大量数据。在本文中,我们提出了一种基于I/ o高效PQ的ANN搜索解决方案。我们设计了一个名为PQB+-forest的索引来支持AQD的高效相似性搜索。PQB+-forest首先通过粗量化器创建PQ代码的多个分区,然后为每个分区构建一个B+-树,称为PQB+-树。通过关注最接近查询的几个选定分区以及PQB+-树的修剪能力,极大地加快了搜索过程。根据在两个包含多达10亿个向量的大规模数据集上进行的实验,我们的方法优于其竞争对手,包括用于ANN搜索的最先进的PQ方法和最先进的LSH方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
PQBF: I/O-Efficient Approximate Nearest Neighbor Search by Product Quantization
Approximate nearest neighbor (ANN) search in high-dimensional space plays an essential role in many multimedia applications. Recently, product quantization (PQ) based methods for ANN search have attracted enormous attention in the community of computer vision, due to its good balance between accuracy and space requirement. PQ based methods embed a high-dimensional vector into a short binary code (called PQ code), and the squared Euclidean distance is estimated by asymmetric quantizer distance (AQD) with pretty high precision. Thus, ANN search in the original space can be converted to similarity search on AQD using the PQ approach. All existing PQ methods are in-memory solutions, which may not handle massive data if they cannot fit entirely in memory. In this paper, we propose an I/O-efficient PQ based solution for ANN search. We design an index called PQB+-forest to support efficient similarity search on AQD. PQB+-forest first creates a number of partitions of the PQ codes by a coarse quantizer and then builds a B+-tree, called PQB+-tree, for each partition. The search process is greatly expedited by focusing on a few selected partitions that are closest to the query, as well as by the pruning power of PQB+-trees. According to the experiments conducted on two large-scale data sets containing up to 1 billion vectors, our method outperforms its competitors, including the state-of-the-art PQ method and the state-of-the-art LSH methods for ANN search.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信