{"title":"Cyclic Workflow Execution Mechanism on Top of MapReduce Framework","authors":"Rong Wu, Liang Shuai, Huaming Liao","doi":"10.1109/SKG.2011.46","DOIUrl":null,"url":null,"abstract":"MapReduce programming model has been used in various kinds of intensive data processing and analysis projects for its ease of use and good scalability. In this paper, we discuss about the execution mechanism of cyclic workflow on top of MapReduce framework. A novel cycle elimination algorithm is proposed to decompose the cyclic workflow to DAG (Directed Acyclic Graph) sub-workflows. It dynamically and recursively searches for the maximum DAG sub-workflow according to current decision result of the decision node in each iteration. DAG sub-workflow scheduling strategy, which is comprised of DAG grouping mechanism and MapReduce task mapping, is also presented. Finally, we propose an intermediate data transmission mechanism named Partition Pushing, which can improve the possible parallelism between the executions of dependent jobs. Experiments show that our proposed workflow execution mechanism can schedule the cyclic workflow efficiently by improving the parallelism between dependent jobs and consequently reduce the workflow make span by 20%-60%.","PeriodicalId":184788,"journal":{"name":"2011 Seventh International Conference on Semantics, Knowledge and Grids","volume":"6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 Seventh International Conference on Semantics, Knowledge and Grids","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SKG.2011.46","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
MapReduce programming model has been used in various kinds of intensive data processing and analysis projects for its ease of use and good scalability. In this paper, we discuss about the execution mechanism of cyclic workflow on top of MapReduce framework. A novel cycle elimination algorithm is proposed to decompose the cyclic workflow to DAG (Directed Acyclic Graph) sub-workflows. It dynamically and recursively searches for the maximum DAG sub-workflow according to current decision result of the decision node in each iteration. DAG sub-workflow scheduling strategy, which is comprised of DAG grouping mechanism and MapReduce task mapping, is also presented. Finally, we propose an intermediate data transmission mechanism named Partition Pushing, which can improve the possible parallelism between the executions of dependent jobs. Experiments show that our proposed workflow execution mechanism can schedule the cyclic workflow efficiently by improving the parallelism between dependent jobs and consequently reduce the workflow make span by 20%-60%.