A study of partitioning and parallel UDF execution with the SAP HANA database

Scientific and statistical database management : International Conference, SSDBM ... : proceedings. International Conference on Scientific and Statistical Database Management Pub Date : 2014-06-30 DOI:10.1145/2618243.2618274

Philippe Grosse, Norman May, Wolfgang Lehner

引用次数: 14

Abstract

Large-scale data analysis relies on custom code both for preparing the data for analysis as well as for the core analysis algorithms. The map-reduce framework offers a simple model to parallelize custom code, but it does not integrate well with relational databases. Likewise, the literature on optimizing queries in relational databases has largely ignored user-defined functions (UDFs). In this paper, we discuss annotations for user-defined functions that facilitate optimizations that both consider relational operators and UDFs. In this paper we focus on optimizations that enable the parallel execution of relational operators and UDFs for a number of typical patterns. A study on real-world data investigates the opportunities for parallelization of complex data flows containing both relational operators and UDFs.

查看原文本刊更多论文

基于SAP HANA数据库的分区和并行UDF执行研究

大规模数据分析依赖于定制代码来准备分析数据以及核心分析算法。map-reduce框架提供了一个简单的模型来并行化定制代码，但是它不能很好地与关系数据库集成。同样，关于优化关系数据库查询的文献在很大程度上忽略了用户定义函数(udf)。在本文中，我们将讨论用户定义函数的注释，这些注释有助于同时考虑关系操作符和udf的优化。在本文中，我们将重点关注为许多典型模式支持并行执行关系运算符和udf的优化。对实际数据的研究探讨了同时包含关系运算符和udf的复杂数据流并行化的可能性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Scientific and statistical database management : International Conference, SSDBM ... : proceedings. International Conference on Scientific and Statistical Database Management

自引率

0.00%

发文量