High dimensional similarity search with space filling curves

Proceedings 17th International Conference on Data Engineering Pub Date : 2001-04-02 DOI:10.1109/ICDE.2001.914876

Swanwa Liao, M. Lopez, Scott T. Leutenegger

引用次数: 100

Abstract

We present a new approach for approximate nearest neighbor queries for sets of high dimensional points under any L/sub t/-metric, t=1,...,/spl infin/. The proposed algorithm is efficient and simple to implement. The algorithm uses multiple shifted copies of the data points and stores them in up to (d+1) B-trees where d is the dimensionality of the data, sorted according to their position along a space filling curve. This is done in a way that allows us to guarantee that a neighbor within an O(d/sup 1+1/t/) factor of the exact nearest, can be returned with at most (d+1)log, n page accesses, where p is the branching factor of the B-trees. In practice, for real data sets, our approximate technique finds the exact nearest neighbor between 87% and 99% of the time and a point no farther than the third nearest neighbor between 98% and 100% of the time. Our solution is dynamic, allowing insertion or deletion of points in O(d log/sub p/ n) page accesses and generalizes easily to find approximate k-nearest neighbors.

查看原文本刊更多论文

基于空间填充曲线的高维相似性搜索

对于任意L/下标t/-metric, t=1，…下的高维点集，提出了一种近似最近邻查询的新方法。spl infin /。该算法效率高，实现简单。该算法使用数据点的多个移位副本，并将它们存储在最多(d+1)棵b树中，其中d是数据的维数，根据它们沿着空间填充曲线的位置进行排序。这是通过一种方式来实现的，这种方式允许我们保证在最接近的O(d/sup 1+1/t/)因子内的邻居，可以以最多(d+1)log, n次页面访问返回，其中p是b树的分支因子。在实践中，对于真实的数据集，我们的近似技术在87%到99%的时间内找到精确的最近邻居，在98%到100%的时间内找到不超过第三最近邻居的点。我们的解决方案是动态的，允许在O(d log/sub p/ n)页访问中插入或删除点，并且可以很容易地找到近似的k近邻。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings 17th International Conference on Data Engineering

自引率

0.00%

发文量