Accelerating Main-Memory Table Scans with Partial Virtual Views

Proceedings of the 19th International Workshop on Data Management on New Hardware Pub Date : 2023-06-18 DOI:10.1145/3592980.3595315

F. Schuhknecht, Justus Henneberg

{"title":"Accelerating Main-Memory Table Scans with Partial Virtual Views","authors":"F. Schuhknecht, Justus Henneberg","doi":"10.1145/3592980.3595315","DOIUrl":null,"url":null,"abstract":"In main-memory column stores, column scans are one of the base operations performed when answering analytical queries. Typically, one or multiple columns must be filtered with respect to the given query predicate, which, by default, involves inspecting all data of the involved columns. To reduce the amount of data to scan, there exist essentially two strategies: (1) Create a coarse-granular index on the column, then use it for early pruning during each scan. While creating such an index is relatively lightweight, unfortunately, accessing the relevant portions of the column through the index causes unpleasant overhead during scanning. (2) Create materialized views that contain semantic portions of the column and filter on these. While this enables fast scans, unfortunately, it requires physical copying and causes significant space overhead. To break this trade-off, in the following, we propose a view-based strategy that avoids any physical copying of column data while providing optimal scan performance. We achieve this by utilizing tools of the virtual memory subsystem provided by the OS: On the lowest level, we materialize all columns within physical main memory. On top of that, we allow the creation of arbitrarily many partial views in virtual memory that map to subsets of the physical columns having certain properties of interest. Creation, maintenance, and usage of these partial virtual views happens fully adaptively as a side-product of scan-based query processing.","PeriodicalId":400127,"journal":{"name":"Proceedings of the 19th International Workshop on Data Management on New Hardware","volume":"17 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 19th International Workshop on Data Management on New Hardware","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3592980.3595315","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

In main-memory column stores, column scans are one of the base operations performed when answering analytical queries. Typically, one or multiple columns must be filtered with respect to the given query predicate, which, by default, involves inspecting all data of the involved columns. To reduce the amount of data to scan, there exist essentially two strategies: (1) Create a coarse-granular index on the column, then use it for early pruning during each scan. While creating such an index is relatively lightweight, unfortunately, accessing the relevant portions of the column through the index causes unpleasant overhead during scanning. (2) Create materialized views that contain semantic portions of the column and filter on these. While this enables fast scans, unfortunately, it requires physical copying and causes significant space overhead. To break this trade-off, in the following, we propose a view-based strategy that avoids any physical copying of column data while providing optimal scan performance. We achieve this by utilizing tools of the virtual memory subsystem provided by the OS: On the lowest level, we materialize all columns within physical main memory. On top of that, we allow the creation of arbitrarily many partial views in virtual memory that map to subsets of the physical columns having certain properties of interest. Creation, maintenance, and usage of these partial virtual views happens fully adaptively as a side-product of scan-based query processing.

查看原文本刊更多论文

用部分虚拟视图加速主存表扫描

在主存列存储中，列扫描是回答分析查询时执行的基本操作之一。通常，必须根据给定的查询谓词过滤一个或多个列，默认情况下，这涉及检查相关列的所有数据。为了减少需要扫描的数据量，主要存在两种策略:(1)在列上创建粗粒度索引，然后在每次扫描期间使用它进行早期修剪。虽然创建这样的索引相对轻量级，但不幸的是，通过索引访问列的相关部分会在扫描期间造成令人不快的开销。(2)创建包含列语义部分的物化视图，并对其进行过滤。虽然这可以实现快速扫描，但不幸的是，它需要物理复制并导致大量的空间开销。为了打破这种权衡，在下面，我们提出了一种基于视图的策略，该策略避免了对列数据的任何物理复制，同时提供了最佳的扫描性能。我们通过利用操作系统提供的虚拟内存子系统的工具来实现这一点:在最低级别上，我们将物理主内存中的所有列具体化。最重要的是，我们允许在虚拟内存中创建任意多的部分视图，这些视图映射到具有某些感兴趣属性的物理列的子集。这些部分虚拟视图的创建、维护和使用完全是作为基于扫描的查询处理的副产品自适应地进行的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 19th International Workshop on Data Management on New Hardware

自引率

0.00%

发文量