{"title":"工作流应用程序中多个MapReduce作业的调度机制(意见书)","authors":"Dongjin Yoo, K. Sim","doi":"10.1109/COMCOMAP.2012.6154882","DOIUrl":null,"url":null,"abstract":"MapReduce is currently an attractive model for data intensive application due to easy interface of programming, high scalability and fault tolerance capability. It is well suited for applications requiring processing large data with distributed processing resources such as web data analysis, bio informatics, and high performance computing area. There are many studies of job scheduling mechanism in shared cluster for MapReduce. However there is a need for scheduling workflow service composed of multiple MapReduce tasks with precedence dependency in multiple processing nodes. The contribution of this paper is proposing a scheduling mechanism for a workflow service containing multiple MapReduce jobs. The workflow application has precedence dependency constraints among multiple tasks, represented as directed acyclic graph (DAG). Also, for less data transfer cost in limited bisection bandwidth, data dependency criterion should be considered for scheduling multiple map-reduce jobs in a workflow. The proposed scheduling mechanism provides 1) scheduling MapReduce tasks regarding precedence constraints and 2) pre-data placement method considering data dependency constraints for saving data transfer cost over network.","PeriodicalId":281865,"journal":{"name":"2012 Computing, Communications and Applications Conference","volume":"40 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-02-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"A scheduling mechanism for multiple MapReduce jobs in a workflow application (position paper)\",\"authors\":\"Dongjin Yoo, K. Sim\",\"doi\":\"10.1109/COMCOMAP.2012.6154882\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"MapReduce is currently an attractive model for data intensive application due to easy interface of programming, high scalability and fault tolerance capability. It is well suited for applications requiring processing large data with distributed processing resources such as web data analysis, bio informatics, and high performance computing area. There are many studies of job scheduling mechanism in shared cluster for MapReduce. However there is a need for scheduling workflow service composed of multiple MapReduce tasks with precedence dependency in multiple processing nodes. The contribution of this paper is proposing a scheduling mechanism for a workflow service containing multiple MapReduce jobs. The workflow application has precedence dependency constraints among multiple tasks, represented as directed acyclic graph (DAG). Also, for less data transfer cost in limited bisection bandwidth, data dependency criterion should be considered for scheduling multiple map-reduce jobs in a workflow. The proposed scheduling mechanism provides 1) scheduling MapReduce tasks regarding precedence constraints and 2) pre-data placement method considering data dependency constraints for saving data transfer cost over network.\",\"PeriodicalId\":281865,\"journal\":{\"name\":\"2012 Computing, Communications and Applications Conference\",\"volume\":\"40 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2012-02-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2012 Computing, Communications and Applications Conference\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/COMCOMAP.2012.6154882\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 Computing, Communications and Applications Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/COMCOMAP.2012.6154882","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A scheduling mechanism for multiple MapReduce jobs in a workflow application (position paper)
MapReduce is currently an attractive model for data intensive application due to easy interface of programming, high scalability and fault tolerance capability. It is well suited for applications requiring processing large data with distributed processing resources such as web data analysis, bio informatics, and high performance computing area. There are many studies of job scheduling mechanism in shared cluster for MapReduce. However there is a need for scheduling workflow service composed of multiple MapReduce tasks with precedence dependency in multiple processing nodes. The contribution of this paper is proposing a scheduling mechanism for a workflow service containing multiple MapReduce jobs. The workflow application has precedence dependency constraints among multiple tasks, represented as directed acyclic graph (DAG). Also, for less data transfer cost in limited bisection bandwidth, data dependency criterion should be considered for scheduling multiple map-reduce jobs in a workflow. The proposed scheduling mechanism provides 1) scheduling MapReduce tasks regarding precedence constraints and 2) pre-data placement method considering data dependency constraints for saving data transfer cost over network.