Exploiting the power of relational databases for efficient stream processing

IF 0.1 Q4 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

ERCIM News Pub Date : 2009-03-24 DOI:10.1145/1516360.1516398

Erietta Liarou, M. Kersten

{"title":"Exploiting the power of relational databases for efficient stream processing","authors":"Erietta Liarou, M. Kersten","doi":"10.1145/1516360.1516398","DOIUrl":null,"url":null,"abstract":"Stream applications gained significant popularity over the last years that lead to the development of specialized stream engines. These systems are designed from scratch with a different philosophy than nowadays database engines in order to cope with the stream applications requirements. However, this means that they lack the power and sophisticated techniques of a full fledged database system that exploits techniques and algorithms accumulated over many years of database research.\n In this paper, we take the opposite route and design a stream engine directly on top of a database kernel. Incoming tuples are directly stored upon arrival in a new kind of system tables, called baskets. A continuous query can then be evaluated over its relevant baskets as a typical one-time query exploiting the power of the relational engine. Once a tuple has been seen by all relevant queries/operators, it is dropped from its basket. A basket can be the input to a single or multiple similar query plans. Furthermore, a query plan can be split into multiple parts each one with its own input/output baskets allowing for flexible load sharing query scheduling. Contrary to traditional stream engines, that process one tuple at a time, this model allows batch processing of tuples, e.g., query a basket only after x tuples arrive or after a time threshold has passed. Furthermore, we are not restricted to process tuples in the order they arrive. Instead, we can selectively pick tuples from a basket based on the query requirements exploiting a novel query component, the basket expressions.\n We investigate the opportunities and challenges that arise with such a direction and we show that it carries significant advantages. We propose a complete architecture, the DataCell, which we implemented on top of an open-source column-oriented DBMS. A detailed analysis and experimental evaluation of the core algorithms using both micro benchmarks and the standard Linear Road benchmark demonstrate the potential of this new approach.","PeriodicalId":44543,"journal":{"name":"ERCIM News","volume":"2009 1","pages":""},"PeriodicalIF":0.1000,"publicationDate":"2009-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/1516360.1516398","citationCount":"61","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ERCIM News","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/1516360.1516398","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}

引用次数: 61

Abstract

Stream applications gained significant popularity over the last years that lead to the development of specialized stream engines. These systems are designed from scratch with a different philosophy than nowadays database engines in order to cope with the stream applications requirements. However, this means that they lack the power and sophisticated techniques of a full fledged database system that exploits techniques and algorithms accumulated over many years of database research. In this paper, we take the opposite route and design a stream engine directly on top of a database kernel. Incoming tuples are directly stored upon arrival in a new kind of system tables, called baskets. A continuous query can then be evaluated over its relevant baskets as a typical one-time query exploiting the power of the relational engine. Once a tuple has been seen by all relevant queries/operators, it is dropped from its basket. A basket can be the input to a single or multiple similar query plans. Furthermore, a query plan can be split into multiple parts each one with its own input/output baskets allowing for flexible load sharing query scheduling. Contrary to traditional stream engines, that process one tuple at a time, this model allows batch processing of tuples, e.g., query a basket only after x tuples arrive or after a time threshold has passed. Furthermore, we are not restricted to process tuples in the order they arrive. Instead, we can selectively pick tuples from a basket based on the query requirements exploiting a novel query component, the basket expressions. We investigate the opportunities and challenges that arise with such a direction and we show that it carries significant advantages. We propose a complete architecture, the DataCell, which we implemented on top of an open-source column-oriented DBMS. A detailed analysis and experimental evaluation of the core algorithms using both micro benchmarks and the standard Linear Road benchmark demonstrate the potential of this new approach.

查看原文本刊更多论文

利用关系数据库的强大功能进行高效的流处理

流应用程序在过去几年中获得了显著的普及，这导致了专门的流引擎的开发。为了满足流应用程序的需求，这些系统的设计理念与现在的数据库引擎不同。然而，这意味着它们缺乏充分利用多年数据库研究积累的技术和算法的成熟数据库系统的功能和复杂技术。在本文中，我们采取相反的路线，直接在数据库内核上设计一个流引擎。传入的元组在到达时直接存储在一种称为筐的新系统表中。然后，可以利用关系引擎的强大功能，将连续查询作为典型的一次性查询在其相关篮子上进行评估。一旦一个元组被所有相关查询/操作符看到，它就会从篮子中删除。篮子可以是单个或多个类似查询计划的输入。此外，查询计划可以分成多个部分，每个部分都有自己的输入/输出篮，允许灵活的负载共享查询调度。与一次处理一个元组的传统流引擎相反，该模型允许对元组进行批处理，例如，仅在x个元组到达或超过时间阈值后才查询篮子。此外，我们并不局限于按照元组到达的顺序来处理它们。相反，我们可以根据查询需求，有选择地从篮中挑选元组，利用一个新的查询组件，即篮表达式。我们研究了这样一个方向所带来的机遇和挑战，并表明它具有显著的优势。我们提出了一个完整的体系结构DataCell，它是在一个开源的面向列的DBMS上实现的。使用微基准和标准线性道路基准对核心算法进行了详细的分析和实验评估，证明了这种新方法的潜力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

ERCIM News COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS-

自引率

0.00%

发文量