CS2: a new database synopsis for query estimation

Proceedings. ACM-SIGMOD International Conference on Management of Data Pub Date : 2013-06-22 DOI:10.1145/2463676.2463701

Feng Yu, W. Hou, Cheng Luo, D. Che, Mengxia Zhu

引用次数: 36

Abstract

Fast and accurate estimations for complex queries are profoundly beneficial for large databases with heavy workloads. In this research, we propose a statistical summary for a database, called CS2 (Correlated Sample Synopsis), to provide rapid and accurate result size estimations for all queries with joins and arbitrary selections. Unlike the state-of-the-art techniques, CS2 does not completely rely on simple random samples, but mainly consists of correlated sample tuples that retain join relationships with less storage. We introduce a statistical technique, called reverse sample, and design a powerful estimator, called reverse estimator, to fully utilize correlated sample tuples for query estimation. We prove both theoretically and empirically that the reverse estimator is unbiased and accurate using CS2. Extensive experiments on multiple datasets show that CS2 is fast to construct and derives more accurate estimations than existing methods with the same space budget.

查看原文本刊更多论文

CS2:用于查询估计的新数据库概要

对复杂查询进行快速而准确的估计对于具有繁重工作负载的大型数据库非常有益。在这项研究中，我们提出了一个数据库的统计摘要，称为CS2(相关样本概要)，为所有具有连接和任意选择的查询提供快速和准确的结果大小估计。与最先进的技术不同，CS2并不完全依赖于简单的随机样本，而是主要由相关的样本元组组成，这些元组用较少的存储空间保留了连接关系。我们引入了一种称为反向样本的统计技术，并设计了一个强大的估计器，称为反向估计器，以充分利用相关样本元组进行查询估计。利用CS2从理论上和经验上证明了反向估计量的无偏性和准确性。在多个数据集上的大量实验表明，在相同空间预算下，CS2比现有方法构建速度快，得到的估计精度更高。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings. ACM-SIGMOD International Conference on Management of Data

自引率

0.00%

发文量