就地查询引擎的矢量化

Proceedings of the 2016 International Conference on Management of Data Pub Date : 2016-06-26 DOI:10.1145/2882903.2914829

Panagiotis Sioulas, A. Ailamaki

{"title":"就地查询引擎的矢量化","authors":"Panagiotis Sioulas, A. Ailamaki","doi":"10.1145/2882903.2914829","DOIUrl":null,"url":null,"abstract":"Database systems serve a wide range of use cases efficiently, but require data to be loaded and adapted to the system's execution engine. This pre-processing step is a bottleneck to the analysis of the increasingly large and heterogeneous datasets. Therefore, numerous research efforts advocate for querying each dataset in situ,i.e., without pre-loading it in a DBMS. On the other hand, performing analysis over raw data entails numerous overheads because of the potentially inefficient data representations. In this paper, we investigate the effect of vector processing on raw data querying. We enhance the operators of a query engine to use SIMD operations. Specifically, we examine the effect of SIMD on two different cases: the scan operators that perform the CPU-intensive task of input parsing, and the part of the query pipeline that performs a selection and computes an aggregate. We show that a vectorized approach has a lot of potential to improve performance, which nevertheless comes with trade-offs.","PeriodicalId":20483,"journal":{"name":"Proceedings of the 2016 International Conference on Management of Data","volume":"73 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2016-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Vectorizing an In Situ Query Engine\",\"authors\":\"Panagiotis Sioulas, A. Ailamaki\",\"doi\":\"10.1145/2882903.2914829\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Database systems serve a wide range of use cases efficiently, but require data to be loaded and adapted to the system's execution engine. This pre-processing step is a bottleneck to the analysis of the increasingly large and heterogeneous datasets. Therefore, numerous research efforts advocate for querying each dataset in situ,i.e., without pre-loading it in a DBMS. On the other hand, performing analysis over raw data entails numerous overheads because of the potentially inefficient data representations. In this paper, we investigate the effect of vector processing on raw data querying. We enhance the operators of a query engine to use SIMD operations. Specifically, we examine the effect of SIMD on two different cases: the scan operators that perform the CPU-intensive task of input parsing, and the part of the query pipeline that performs a selection and computes an aggregate. We show that a vectorized approach has a lot of potential to improve performance, which nevertheless comes with trade-offs.\",\"PeriodicalId\":20483,\"journal\":{\"name\":\"Proceedings of the 2016 International Conference on Management of Data\",\"volume\":\"73 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-06-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2016 International Conference on Management of Data\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2882903.2914829\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2016 International Conference on Management of Data","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2882903.2914829","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

数据库系统有效地服务于广泛的用例，但需要加载数据并使其适应系统的执行引擎。这一预处理步骤是分析日益庞大和异构数据集的瓶颈。因此，许多研究工作提倡就地查询每个数据集，即。，而无需在DBMS中预加载它。另一方面，对原始数据执行分析会带来大量开销，因为可能存在低效的数据表示。本文研究了向量处理对原始数据查询的影响。我们增强了查询引擎的操作符，以使用SIMD操作。具体来说，我们将研究SIMD在两种不同情况下的影响:执行输入解析的cpu密集型任务的扫描操作符，以及执行选择和计算聚合的查询管道部分。我们展示了矢量化方法在提高性能方面有很大的潜力，然而这是有代价的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Vectorizing an In Situ Query Engine

Database systems serve a wide range of use cases efficiently, but require data to be loaded and adapted to the system's execution engine. This pre-processing step is a bottleneck to the analysis of the increasingly large and heterogeneous datasets. Therefore, numerous research efforts advocate for querying each dataset in situ,i.e., without pre-loading it in a DBMS. On the other hand, performing analysis over raw data entails numerous overheads because of the potentially inefficient data representations. In this paper, we investigate the effect of vector processing on raw data querying. We enhance the operators of a query engine to use SIMD operations. Specifically, we examine the effect of SIMD on two different cases: the scan operators that perform the CPU-intensive task of input parsing, and the part of the query pipeline that performs a selection and computes an aggregate. We show that a vectorized approach has a lot of potential to improve performance, which nevertheless comes with trade-offs.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 2016 International Conference on Management of Data

自引率

0.00%

发文量