A generic front-stage for semi-stream processing

Proceedings of the 22nd ACM international conference on Information & Knowledge Management Pub Date : 2013-10-27 DOI:10.1145/2505515.2505734

M. Naeem, Gerald Weber, G. Dobbie, C. Lutteroth

引用次数: 3

Abstract

Recently, a number of semi-stream join algorithms have been published. The typical system setup for these consists of one fast stream input that has to be joined with a disk-based relation R. These semi-stream join approaches typically perform the join with a limited main memory partition assigned to them, which is generally not large enough to hold the whole relation R. We propose a caching approach that can be used as a front-stage for different semi-stream join algorithms, resulting in significant performance gains for common applications. We analyze our approach in the context of a seminal semi-stream join, MESHJOIN (Mesh Join), and provide a cost model for the resulting semi-stream join algorithm, which we call CMESHJOIN (Cached Mesh Join). The algorithm takes advantage of skewed distributions; this article presents results for Zipfian distributions of the type that appears in many applications.

查看原文本刊更多论文

用于半流处理的通用前台

最近，已经发表了许多半流连接算法。这些方法的典型系统设置包括一个必须与基于磁盘的关系r连接的快速流输入。这些半流连接方法通常使用分配给它们的有限主内存分区来执行连接，该分区通常不足以容纳整个关系r。我们提出了一种缓存方法，可以用作不同半流连接算法的前台，从而为常见应用程序带来显着的性能提升。我们在一个重要的半流连接MESHJOIN (Mesh join)的背景下分析了我们的方法，并为所得到的半流连接算法提供了一个成本模型，我们称之为CMESHJOIN (Cached Mesh join)。该算法利用了偏态分布;本文给出了在许多应用程序中出现的Zipfian分布的结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 22nd ACM international conference on Information & Knowledge Management

自引率

0.00%

发文量