Stream-join revisited in the context of epoch-based SQL continuous query

Proceedings. International Database Engineering and Applications Symposium Pub Date : 2012-08-08 DOI:10.1145/2351476.2351491

Qiming Chen, M. Hsu

{"title":"Stream-join revisited in the context of epoch-based SQL continuous query","authors":"Qiming Chen, M. Hsu","doi":"10.1145/2351476.2351491","DOIUrl":null,"url":null,"abstract":"The current generation of stream processing systems is in general built separately from the query engine thus lacks the expressive power of SQL and causes significant overhead in data access and movement. This situation has motivated us to leverage the query engine for stream processing.\n Stream-join is a window operation where the key issue is how to punctuate and pair two or more correlated streams. In this work we tackle this issue in the specific context of query engine supported stream processing. We focus on the following problems: a SQL query is definable on bounded relation data but stream data are unbounded, and join multiple streams is a stateful (thus history-sensitive) operation but a SQL query only cares about the current state; further, relation join typically requires relation re-scan in a nested-loop but by nature a stream cannot be re-captured as reading a stream always gets newly incoming data.\n To leverage query processing for analyzing unbounded stream, we defined the Epoch-based Continuous Query (ECQ) model which allows a SQL query to be executed epoch by epoch for processing the stream data chunk by chunk. However, unlike multiple one-time queries, an ECQ is a single, continuous query instance across execution epochs for keeping the continuity of the application state as required by the history-sensitive operations such as sliding-window join.\n To joining multiple streams, we further developed the techniques to cache one or more consecutive data chunks falling in a sliding window across query execution epochs in the ECQ instance, to allow them to be re-delivered from the cache. In this way join multiple streams and self-join a single stream in the data chunk based window or sliding window, with various pairing schemes, are made possible.\n We extended the PostgreSQL engine to support the proposed approach. Our experience has demonstrated its value.","PeriodicalId":93615,"journal":{"name":"Proceedings. International Database Engineering and Applications Symposium","volume":"89 1","pages":"130-138"},"PeriodicalIF":0.0000,"publicationDate":"2012-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings. International Database Engineering and Applications Symposium","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2351476.2351491","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

The current generation of stream processing systems is in general built separately from the query engine thus lacks the expressive power of SQL and causes significant overhead in data access and movement. This situation has motivated us to leverage the query engine for stream processing. Stream-join is a window operation where the key issue is how to punctuate and pair two or more correlated streams. In this work we tackle this issue in the specific context of query engine supported stream processing. We focus on the following problems: a SQL query is definable on bounded relation data but stream data are unbounded, and join multiple streams is a stateful (thus history-sensitive) operation but a SQL query only cares about the current state; further, relation join typically requires relation re-scan in a nested-loop but by nature a stream cannot be re-captured as reading a stream always gets newly incoming data. To leverage query processing for analyzing unbounded stream, we defined the Epoch-based Continuous Query (ECQ) model which allows a SQL query to be executed epoch by epoch for processing the stream data chunk by chunk. However, unlike multiple one-time queries, an ECQ is a single, continuous query instance across execution epochs for keeping the continuity of the application state as required by the history-sensitive operations such as sliding-window join. To joining multiple streams, we further developed the techniques to cache one or more consecutive data chunks falling in a sliding window across query execution epochs in the ECQ instance, to allow them to be re-delivered from the cache. In this way join multiple streams and self-join a single stream in the data chunk based window or sliding window, with various pairing schemes, are made possible. We extended the PostgreSQL engine to support the proposed approach. Our experience has demonstrated its value.

查看原文本刊更多论文

在基于epoch的SQL连续查询的上下文中重新讨论了流连接

当前一代的流处理系统通常是与查询引擎分开构建的，因此缺乏SQL的表达能力，并导致数据访问和移动方面的巨大开销。这种情况促使我们利用查询引擎进行流处理。流连接是一个窗口操作，其关键问题是如何对两个或多个相关流进行标点和配对。在这项工作中，我们在查询引擎支持的流处理的特定背景下解决了这个问题。我们关注以下问题:SQL查询在有界关系数据上是可定义的，但流数据是无界的，连接多个流是有状态的(因此是历史敏感的)操作，但SQL查询只关心当前状态;此外，关系连接通常需要在嵌套循环中重新扫描关系，但本质上不能重新捕获流，因为读取流总是获得新传入的数据。为了利用查询处理来分析无界流，我们定义了基于epoch的连续查询(ECQ)模型，该模型允许一个SQL查询逐epoch执行，以逐块处理流数据。然而，与多个一次性查询不同，ECQ是跨执行时期的单个连续查询实例，用于保持应用程序状态的连续性，以满足历史敏感操作(如滑动窗口连接)的要求。为了连接多个流，我们进一步开发了一种技术，可以在ECQ实例中跨查询执行时间段的滑动窗口中缓存一个或多个连续的数据块，以允许它们从缓存中重新交付。通过这种方式，可以在基于数据块的窗口或滑动窗口中使用各种配对方案连接多个流和自连接单个流。我们扩展了PostgreSQL引擎来支持这个提议的方法。我们的经验证明了它的价值。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings. International Database Engineering and Applications Symposium

自引率

0.00%

发文量