Managing parallelism for stream processing in the cloud

HotCDP '12 Pub Date : 2012-04-10 DOI:10.1145/2169090.2169091

Nathan Backman, Rodrigo Fonseca, U. Çetintemel

引用次数: 39

Abstract

Stream processing applications run continuously and have varying load. Cloud infrastructures present an attractive option to meet these fluctuating computational demands. Coordinating such resources to meet end-to-end latency objectives efficiently is important in preventing the frivolous use of cloud resources. We present a framework that parallelizes and schedules workflows of stream operators, in real-time, to meet latency objectives. It supports data- and task-parallel processing of all workflow operators, by all computing nodes, while maintaining the ordering properties of sorted data streams. We show that a latency-oriented operator scheduling policy coupled with the diversification of computing node responsibilities encourages parallelism models that achieve end-to-end latency-minimization goals. We demonstrate the effectiveness of our framework with preliminary experimental results using a variety of real-world applications on heterogeneous clusters.

查看原文本刊更多论文

管理云中流处理的并行性

流处理应用程序连续运行并具有不同的负载。云基础设施为满足这些波动的计算需求提供了一个有吸引力的选择。协调这些资源以有效地满足端到端延迟目标对于防止对云资源的无谓使用非常重要。我们提出了一个框架，可以实时并行和调度流操作符的工作流，以满足延迟目标。它支持所有计算节点对所有工作流操作符进行数据和任务并行处理，同时保持已排序数据流的排序属性。我们展示了一个面向延迟的操作员调度策略，加上计算节点职责的多样化，鼓励实现端到端延迟最小化目标的并行模型。我们通过在异构集群上使用各种实际应用程序的初步实验结果证明了我们的框架的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

HotCDP '12

自引率

0.00%

发文量