Pando: Enhanced Data Skipping with Logical Data Partitioning

Proc. VLDB Endow. Pub Date : 2023-05-01 DOI:10.14778/3598581.3598601

Sivaprasad Sudhir, Wenbo Tao, N. Laptev, Cyrille Habis, Michael J. Cafarella, S. Madden

{"title":"Pando: Enhanced Data Skipping with Logical Data Partitioning","authors":"Sivaprasad Sudhir, Wenbo Tao, N. Laptev, Cyrille Habis, Michael J. Cafarella, S. Madden","doi":"10.14778/3598581.3598601","DOIUrl":null,"url":null,"abstract":"With enormous volumes of data, quickly retrieving data that is relevant to a query is essential for achieving high performance. Modern cloud-based database systems often partition the data into blocks and employ various techniques to skip irrelevant blocks during query execution. Several algorithms, often based on historical properties of a workload of queries run over the data, have been proposed to tune the physical layout of data to reduce the number of blocks accessed. The effectiveness of these methods at skipping blocks depends on what metadata is stored and how well the physical data layout aligns with the queries. Existing work on automatic physical database design misses significant opportunities in skipping blocks because it ignores logical predicates in the workload that exhibit strongly correlated results. In this paper, we present Pando which enables significantly better block skipping than past methods by informing physical layout decisions with correlation-aware logical partitioning. Across a range of benchmark and real-world workloads, Pando attains up to 2.8X reduction in the number of blocks scanned and up to 2.3X speedup in end-to-end query execution time over the state-of-the-art techniques.","PeriodicalId":20467,"journal":{"name":"Proc. VLDB Endow.","volume":"20 1","pages":"2316-2329"},"PeriodicalIF":0.0000,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proc. VLDB Endow.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.14778/3598581.3598601","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

With enormous volumes of data, quickly retrieving data that is relevant to a query is essential for achieving high performance. Modern cloud-based database systems often partition the data into blocks and employ various techniques to skip irrelevant blocks during query execution. Several algorithms, often based on historical properties of a workload of queries run over the data, have been proposed to tune the physical layout of data to reduce the number of blocks accessed. The effectiveness of these methods at skipping blocks depends on what metadata is stored and how well the physical data layout aligns with the queries. Existing work on automatic physical database design misses significant opportunities in skipping blocks because it ignores logical predicates in the workload that exhibit strongly correlated results. In this paper, we present Pando which enables significantly better block skipping than past methods by informing physical layout decisions with correlation-aware logical partitioning. Across a range of benchmark and real-world workloads, Pando attains up to 2.8X reduction in the number of blocks scanned and up to 2.3X speedup in end-to-end query execution time over the state-of-the-art techniques.

查看原文本刊更多论文

Pando:增强数据跳跃与逻辑数据分区

对于大量的数据，快速检索与查询相关的数据对于实现高性能至关重要。现代基于云的数据库系统经常将数据划分为块，并使用各种技术在查询执行期间跳过不相关的块。已经提出了几种算法(通常基于在数据上运行的查询工作负载的历史属性)来调优数据的物理布局，以减少访问的块数量。这些跳过块的方法的有效性取决于存储的元数据以及物理数据布局与查询的对齐程度。现有的自动物理数据库设计工作错过了跳过块的重要机会，因为它忽略了工作负载中显示强烈相关结果的逻辑谓词。在本文中，我们提出了Pando，它通过使用关联感知逻辑分区通知物理布局决策，从而比过去的方法实现更好的块跳转。在一系列基准测试和实际工作负载中，Pando的扫描块数量减少了2.8倍，端到端查询执行时间加快了2.3倍。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proc. VLDB Endow.

自引率

0.00%

发文量