PipeFlow Engine: Pipeline Scheduling with Distributed Workflow Made Simple

2013 International Conference on Parallel and Distributed Systems Pub Date : 2013-12-15 DOI:10.1109/ICPADS.2013.31

Yin Li, Chuang Lin

引用次数: 0

Abstract

Distributed computing system is considered as a fundamental architecture to extend resources such as computation speed, storage capacity, and network bandwidth, which are limited for a single processor. Emerging big data processing techniques like Hadoop take advantages of distributed servers to accomplish scalable parallel computations. Large-scale processing jobs can run on different servers or even different clusters interdependently and be combined together as a workflow to provide meaningful outputs. In this paper, we analyze the common demands of big-data processing and distributed big-data workflow processing. According to that, we design Pipe Flow Engine that has the matching features to meet each of these demands. It orchestrates all involved jobs and schedules them in a batched pipeline mode. We also present two online ranking algorithms that make use of the Pipe Flow, sharing the experience and best practice of using Pipe Flow.

查看原文本刊更多论文

PipeFlow引擎:管道调度与分布式工作流简单

分布式计算系统被认为是扩展单个处理器所限制的计算速度、存储容量和网络带宽等资源的基本体系结构。新兴的大数据处理技术，如Hadoop，利用分布式服务器来完成可扩展的并行计算。大规模处理作业可以在不同的服务器甚至不同的集群上相互依赖地运行，并作为工作流组合在一起以提供有意义的输出。本文分析了大数据处理和分布式大数据工作流处理的常见需求。据此，我们设计了具有匹配功能的管道流引擎，以满足这些需求。它编排所有涉及的作业，并以批处理管道模式调度它们。我们还介绍了两种利用管道流的在线排名算法，分享了使用管道流的经验和最佳实践。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2013 International Conference on Parallel and Distributed Systems

自引率

0.00%

发文量