CORES

ACM Transactions on Storage (TOS) Pub Date : 2019-06-26 DOI:10.1145/3321704

Weidong Wen, Yang Li, Wenhai Li, Lingfeng Deng, Yanxiang He

{"title":"CORES","authors":"Weidong Wen, Yang Li, Wenhai Li, Lingfeng Deng, Yanxiang He","doi":"10.1145/3321704","DOIUrl":null,"url":null,"abstract":"The relatively high cost of record deserialization is increasingly becoming the bottleneck of column-based storage systems in tree-structured applications [58]. Due to record transformation in the storage layer, unnecessary processing costs derived from fields and rows irrelevant to queries may be very heavy in nested schemas, significantly wasting the computational resources in large-scale analytical workloads. This leads to the question of how to reduce both the deserialization and IO costs of queries with highly selective filters following arbitrary paths in a nested schema. We present CORES (Column-Oriented Regeneration Embedding Scheme) to push highly selective filters down into column-based storage engines, where each filter consists of several filtering conditions on a field. By applying highly selective filters in the storage layer, we demonstrate that both the deserialization and IO costs could be significantly reduced. We show how to introduce fine-grained composition on filtering results. We generalize this technique by two pair-wise operations, rollup and drilldown, such that a series of conjunctive filters can effectively deliver their payloads in nested schema. The proposed methods are implemented on an open-source platform. For practical purposes, we highlight how to build a column storage engine and how to drive a query efficiently based on a cost model. We apply this design to the nested relational model especially when hierarchical entities are frequently required by ad hoc queries. The experiments, including a real workload and the modified TPCH benchmark, demonstrate that CORES improves the performance by 0.7×--26.9× compared to state-of-the-art platforms in scan-intensive workloads.","PeriodicalId":273014,"journal":{"name":"ACM Transactions on Storage (TOS)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Storage (TOS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3321704","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 13

Abstract

The relatively high cost of record deserialization is increasingly becoming the bottleneck of column-based storage systems in tree-structured applications [58]. Due to record transformation in the storage layer, unnecessary processing costs derived from fields and rows irrelevant to queries may be very heavy in nested schemas, significantly wasting the computational resources in large-scale analytical workloads. This leads to the question of how to reduce both the deserialization and IO costs of queries with highly selective filters following arbitrary paths in a nested schema. We present CORES (Column-Oriented Regeneration Embedding Scheme) to push highly selective filters down into column-based storage engines, where each filter consists of several filtering conditions on a field. By applying highly selective filters in the storage layer, we demonstrate that both the deserialization and IO costs could be significantly reduced. We show how to introduce fine-grained composition on filtering results. We generalize this technique by two pair-wise operations, rollup and drilldown, such that a series of conjunctive filters can effectively deliver their payloads in nested schema. The proposed methods are implemented on an open-source platform. For practical purposes, we highlight how to build a column storage engine and how to drive a query efficiently based on a cost model. We apply this design to the nested relational model especially when hierarchical entities are frequently required by ad hoc queries. The experiments, including a real workload and the modified TPCH benchmark, demonstrate that CORES improves the performance by 0.7×--26.9× compared to state-of-the-art platforms in scan-intensive workloads.

查看原文本刊更多论文

核

相对较高的记录反序列化成本日益成为树结构应用中基于列的存储系统的瓶颈。由于存储层的记录转换，在嵌套模式中，与查询无关的字段和行产生的不必要的处理成本可能非常高，在大规模分析工作负载中严重浪费计算资源。这就导致了一个问题，即如何在嵌套模式中使用遵循任意路径的高度选择性过滤器来减少查询的反序列化和IO成本。我们提出了core(面向列的再生嵌入方案)，将高选择性过滤器推送到基于列的存储引擎中，其中每个过滤器由字段上的几个过滤条件组成。通过在存储层中应用高选择性滤波器，我们证明反序列化和IO成本都可以显着降低。我们将展示如何在过滤结果中引入细粒度组合。我们通过两个成对操作(rollup和drilldown)对该技术进行了推广，这样一系列联合过滤器就可以在嵌套模式中有效地交付它们的有效负载。所提出的方法在一个开源平台上实现。出于实际目的，我们重点介绍了如何构建列存储引擎以及如何基于成本模型高效地驱动查询。我们将这种设计应用于嵌套关系模型，特别是当特别查询经常需要分层实体时。包括真实工作负载和修改后的TPCH基准测试在内的实验表明，在扫描密集型工作负载中，与最先进的平台相比，内核的性能提高了0.7 -26.9倍。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

ACM Transactions on Storage (TOS)

自引率

0.00%

发文量