{"title":"混合计算环境下大规模工作流的数据感知分区与优化方法","authors":"Rubing Duan, Xiaorong Li","doi":"10.1109/ICPADS.2013.29","DOIUrl":null,"url":null,"abstract":"While hybrid computing environments provide good potential for achieving high performance and low economic cost, it also introduces a broad set of unpredictable overheads especially for running data-intensive applications. This paper describes a novel approach which refines workflow structures and optimizes intermediate data transfers for large-scale scientific workflows containing thousands (or even millions) of tasks. The proposed method includes pre- and post-partitioning of workflows and data-flow optimization. Firstly, it partitions a workflow by identifying the critical path of the task graph. Secondly, it controls the granularity of partitions to reduce the complexity of task graph in order to process large-scale workflows. Thirdly, it optimizes the data-flow based on the scheduling to minimize its communication overheads. Our proposed approach is able to handle complex data flows and significantly reduce data transfer by replacing individual tasks according to data dependencies. We conducted experiments using real applications such as Montage and Broadband, and the results demonstrated the effectiveness of our methods in achieving low execution time with low communication overhead in a hybrid computing environments.","PeriodicalId":160979,"journal":{"name":"2013 International Conference on Parallel and Distributed Systems","volume":"21 10 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Data-Aware Partitioning and Optimization Method for Large-Scale Workflows in Hybrid Computing Environments\",\"authors\":\"Rubing Duan, Xiaorong Li\",\"doi\":\"10.1109/ICPADS.2013.29\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"While hybrid computing environments provide good potential for achieving high performance and low economic cost, it also introduces a broad set of unpredictable overheads especially for running data-intensive applications. This paper describes a novel approach which refines workflow structures and optimizes intermediate data transfers for large-scale scientific workflows containing thousands (or even millions) of tasks. The proposed method includes pre- and post-partitioning of workflows and data-flow optimization. Firstly, it partitions a workflow by identifying the critical path of the task graph. Secondly, it controls the granularity of partitions to reduce the complexity of task graph in order to process large-scale workflows. Thirdly, it optimizes the data-flow based on the scheduling to minimize its communication overheads. Our proposed approach is able to handle complex data flows and significantly reduce data transfer by replacing individual tasks according to data dependencies. We conducted experiments using real applications such as Montage and Broadband, and the results demonstrated the effectiveness of our methods in achieving low execution time with low communication overhead in a hybrid computing environments.\",\"PeriodicalId\":160979,\"journal\":{\"name\":\"2013 International Conference on Parallel and Distributed Systems\",\"volume\":\"21 10 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-12-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2013 International Conference on Parallel and Distributed Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICPADS.2013.29\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 International Conference on Parallel and Distributed Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICPADS.2013.29","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A Data-Aware Partitioning and Optimization Method for Large-Scale Workflows in Hybrid Computing Environments
While hybrid computing environments provide good potential for achieving high performance and low economic cost, it also introduces a broad set of unpredictable overheads especially for running data-intensive applications. This paper describes a novel approach which refines workflow structures and optimizes intermediate data transfers for large-scale scientific workflows containing thousands (or even millions) of tasks. The proposed method includes pre- and post-partitioning of workflows and data-flow optimization. Firstly, it partitions a workflow by identifying the critical path of the task graph. Secondly, it controls the granularity of partitions to reduce the complexity of task graph in order to process large-scale workflows. Thirdly, it optimizes the data-flow based on the scheduling to minimize its communication overheads. Our proposed approach is able to handle complex data flows and significantly reduce data transfer by replacing individual tasks according to data dependencies. We conducted experiments using real applications such as Montage and Broadband, and the results demonstrated the effectiveness of our methods in achieving low execution time with low communication overhead in a hybrid computing environments.