Massively-parallel stream processing under QoS constraints with Nephele

IEEE International Symposium on High-Performance Parallel Distributed Computing Pub Date : 2012-06-18 DOI:10.1145/2287076.2287117

Björn Lohrmann, Daniel Warneke, O. Kao

{"title":"Massively-parallel stream processing under QoS constraints with Nephele","authors":"Björn Lohrmann, Daniel Warneke, O. Kao","doi":"10.1145/2287076.2287117","DOIUrl":null,"url":null,"abstract":"Today, a growing number of commodity devices, like mobile phones or smart meters, is equipped with rich sensors and capable of producing continuous data streams. The sheer amount of these devices and the resulting overall data volumes of the streams raise new challenges with respect to the scalability of existing stream processing systems.\n At the same time, massively-parallel data processing systems like MapReduce have proven that they scale to large numbers of nodes and efficiently organize data transfers between them. Many of these systems also provide streaming capabilities. However, unlike traditional stream processors, these systems have disregarded QoS requirements of prospective stream processing applications so far.\n In this paper we address this gap. First, we analyze common design principles of today's parallel data processing frameworks and identify those principles that provide degrees of freedom in trading off the QoS goals latency and throughput. Second, we propose a scheme which allows these frameworks to detect violations of user-defined latency constraints and optimize the job execution without manual interaction in order to meet these constraints while keeping the throughput as high as possible. As a proof of concept, we implemented our approach for our parallel data processing framework Nephele and evaluated its effectiveness through a comparison with Hadoop Online.\n For a multimedia streaming application we can demonstrate an improved processing latency by factor of at least 15 while preserving high data throughput when needed.","PeriodicalId":330072,"journal":{"name":"IEEE International Symposium on High-Performance Parallel Distributed Computing","volume":"27 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"21","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE International Symposium on High-Performance Parallel Distributed Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2287076.2287117","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 21

Abstract

Today, a growing number of commodity devices, like mobile phones or smart meters, is equipped with rich sensors and capable of producing continuous data streams. The sheer amount of these devices and the resulting overall data volumes of the streams raise new challenges with respect to the scalability of existing stream processing systems. At the same time, massively-parallel data processing systems like MapReduce have proven that they scale to large numbers of nodes and efficiently organize data transfers between them. Many of these systems also provide streaming capabilities. However, unlike traditional stream processors, these systems have disregarded QoS requirements of prospective stream processing applications so far. In this paper we address this gap. First, we analyze common design principles of today's parallel data processing frameworks and identify those principles that provide degrees of freedom in trading off the QoS goals latency and throughput. Second, we propose a scheme which allows these frameworks to detect violations of user-defined latency constraints and optimize the job execution without manual interaction in order to meet these constraints while keeping the throughput as high as possible. As a proof of concept, we implemented our approach for our parallel data processing framework Nephele and evaluated its effectiveness through a comparison with Hadoop Online. For a multimedia streaming application we can demonstrate an improved processing latency by factor of at least 15 while preserving high data throughput when needed.

查看原文本刊更多论文

基于Nephele的QoS约束下的大规模并行流处理

今天，越来越多的商品设备，如手机或智能电表，配备了丰富的传感器，能够产生连续的数据流。这些设备的绝对数量和由此产生的流的总体数据量对现有流处理系统的可扩展性提出了新的挑战。与此同时，像MapReduce这样的大规模并行数据处理系统已经证明，它们可以扩展到大量节点，并有效地组织节点之间的数据传输。其中许多系统还提供流媒体功能。然而，与传统的流处理器不同，到目前为止，这些系统忽略了预期流处理应用的QoS要求。在本文中，我们解决了这一差距。首先，我们分析了当今并行数据处理框架的常见设计原则，并确定了在权衡QoS目标延迟和吞吐量时提供自由度的那些原则。其次，我们提出了一种方案，该方案允许这些框架检测违反用户定义的延迟约束并优化作业执行，而无需人工交互，以满足这些约束，同时保持尽可能高的吞吐量。作为概念验证，我们在并行数据处理框架Nephele中实现了我们的方法，并通过与Hadoop Online的比较来评估其有效性。对于多媒体流应用程序，我们可以演示将处理延迟提高至少15倍，同时在需要时保持高数据吞吐量。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE International Symposium on High-Performance Parallel Distributed Computing

自引率

0.00%

发文量