Speeding Up Permutation Based Indexing with Indexing

2009 Second International Workshop on Similarity Search and Applications Pub Date : 2009-08-29 DOI:10.1109/SISAP.2009.12

Karina Figueroa, K. Fredriksson

引用次数: 18

Abstract

A recent probabilistic approach for searching in high dimensional metric spaces is based on predicting the distances between database elements according to how they order their distances towards some set of distinguished elements, called permutants. In the preprocessing phase a set of permutants is chosen, and are sorted (permuted) by their distances against every database element. The permutations form the index. When a query is given, its corresponding permutation is computed, and --- as similar elements will (probably) have a similar permutation --- the database is compared in the order induced by the similarity between permutations. This works well but has relatively high CPU time due to computing the distances between permutations and (partially) sorting the database by the similarity. We improve this by identifying and solving this as another metric space problem. This avoids many distance computations between the permutants. The experimental results show that this works extremely well in practice.

查看原文本刊更多论文

用索引加速基于排列的索引

最近在高维度量空间中搜索的一种概率方法是基于预测数据库元素之间的距离，根据它们对一些不同元素(称为置换)的距离排序。在预处理阶段，选择一组置换，并根据它们相对于每个数据库元素的距离进行排序(置换)。这些排列构成了索引。当给定一个查询时，将计算其对应的排列，并且—由于相似的元素将(可能)具有相似的排列—按照排列之间的相似性所引起的顺序对数据库进行比较。这种方法工作得很好，但是由于计算排列之间的距离和(部分地)根据相似性对数据库进行排序，因此CPU时间相对较高。我们通过识别和解决另一个度量空间问题来改进它。这避免了置换之间的许多距离计算。实验结果表明，该方法在实际应用中效果良好。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2009 Second International Workshop on Similarity Search and Applications

自引率

0.00%

发文量