A sampling-based estimator for top-k selection query

Proceedings 18th International Conference on Data Engineering Pub Date : 2002-08-07 DOI:10.1109/ICDE.2002.994779

Chung-Min Chen, Y. Ling

引用次数: 26

Abstract

Top-k queries arise naturally in many database applications that require searching for records whose attribute values are close to those specified in a query. We study the problem of processing a top-k query by translating it into an approximate range query that can be efficiently processed by traditional relational DBMSs. We propose a sampling-based approach, along with various query mapping strategies, to determine a range query that yields high recall with low access cost. Our experiments on real-world datasets show that, given the same memory budgets, our sampling-based estimator outperforms a previous histogram-based method in terms of access cost, while achieving the same level of recall. Furthermore, unlike the histogram-based approach, our sampling-based query mapping scheme scales well for high dimensional data and is easy to implement with low maintenance cost.

查看原文本刊更多论文

top-k选择查询的基于抽样的估计器

Top-k查询在许多数据库应用程序中自然出现，这些应用程序需要搜索属性值与查询中指定的值接近的记录。我们通过将top-k查询转换为可由传统关系dbms有效处理的近似范围查询来研究处理top-k查询的问题。我们提出了一种基于抽样的方法，以及各种查询映射策略，以确定以低访问成本产生高召回的范围查询。我们在真实数据集上的实验表明，给定相同的内存预算，我们基于抽样的估计器在访问成本方面优于之前基于直方图的方法，同时达到相同的召回水平。此外，与基于直方图的方法不同，我们的基于抽样的查询映射方案适用于高维数据，并且易于实现，维护成本低。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings 18th International Conference on Data Engineering

自引率

0.00%

发文量