管理云中流处理的并行性

HotCDP '12 Pub Date : 2012-04-10 DOI:10.1145/2169090.2169091

Nathan Backman, Rodrigo Fonseca, U. Çetintemel

{"title":"管理云中流处理的并行性","authors":"Nathan Backman, Rodrigo Fonseca, U. Çetintemel","doi":"10.1145/2169090.2169091","DOIUrl":null,"url":null,"abstract":"Stream processing applications run continuously and have varying load. Cloud infrastructures present an attractive option to meet these fluctuating computational demands. Coordinating such resources to meet end-to-end latency objectives efficiently is important in preventing the frivolous use of cloud resources. We present a framework that parallelizes and schedules workflows of stream operators, in real-time, to meet latency objectives. It supports data- and task-parallel processing of all workflow operators, by all computing nodes, while maintaining the ordering properties of sorted data streams. We show that a latency-oriented operator scheduling policy coupled with the diversification of computing node responsibilities encourages parallelism models that achieve end-to-end latency-minimization goals. We demonstrate the effectiveness of our framework with preliminary experimental results using a variety of real-world applications on heterogeneous clusters.","PeriodicalId":183902,"journal":{"name":"HotCDP '12","volume":"295 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"39","resultStr":"{\"title\":\"Managing parallelism for stream processing in the cloud\",\"authors\":\"Nathan Backman, Rodrigo Fonseca, U. Çetintemel\",\"doi\":\"10.1145/2169090.2169091\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Stream processing applications run continuously and have varying load. Cloud infrastructures present an attractive option to meet these fluctuating computational demands. Coordinating such resources to meet end-to-end latency objectives efficiently is important in preventing the frivolous use of cloud resources. We present a framework that parallelizes and schedules workflows of stream operators, in real-time, to meet latency objectives. It supports data- and task-parallel processing of all workflow operators, by all computing nodes, while maintaining the ordering properties of sorted data streams. We show that a latency-oriented operator scheduling policy coupled with the diversification of computing node responsibilities encourages parallelism models that achieve end-to-end latency-minimization goals. We demonstrate the effectiveness of our framework with preliminary experimental results using a variety of real-world applications on heterogeneous clusters.\",\"PeriodicalId\":183902,\"journal\":{\"name\":\"HotCDP '12\",\"volume\":\"295 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2012-04-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"39\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"HotCDP '12\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2169090.2169091\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"HotCDP '12","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2169090.2169091","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 39

摘要

流处理应用程序连续运行并具有不同的负载。云基础设施为满足这些波动的计算需求提供了一个有吸引力的选择。协调这些资源以有效地满足端到端延迟目标对于防止对云资源的无谓使用非常重要。我们提出了一个框架，可以实时并行和调度流操作符的工作流，以满足延迟目标。它支持所有计算节点对所有工作流操作符进行数据和任务并行处理，同时保持已排序数据流的排序属性。我们展示了一个面向延迟的操作员调度策略，加上计算节点职责的多样化，鼓励实现端到端延迟最小化目标的并行模型。我们通过在异构集群上使用各种实际应用程序的初步实验结果证明了我们的框架的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Managing parallelism for stream processing in the cloud

Stream processing applications run continuously and have varying load. Cloud infrastructures present an attractive option to meet these fluctuating computational demands. Coordinating such resources to meet end-to-end latency objectives efficiently is important in preventing the frivolous use of cloud resources. We present a framework that parallelizes and schedules workflows of stream operators, in real-time, to meet latency objectives. It supports data- and task-parallel processing of all workflow operators, by all computing nodes, while maintaining the ordering properties of sorted data streams. We show that a latency-oriented operator scheduling policy coupled with the diversification of computing node responsibilities encourages parallelism models that achieve end-to-end latency-minimization goals. We demonstrate the effectiveness of our framework with preliminary experimental results using a variety of real-world applications on heterogeneous clusters.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

HotCDP '12

自引率

0.00%

发文量