{"title":"Taming Computation Skews of Block-Oriented Iterative Scientific Applications in MapReduce Systems","authors":"Xin Yang, Min Li, Ze Yu, Xiaolin Li","doi":"10.1109/CLOUD.2014.33","DOIUrl":null,"url":null,"abstract":"Nowadays, scientists are embracing big data techniques for exploring significant discoveries from large volumes of scientific data quickly. Properly partitioning workloads is essential for fully exploiting the benefit of parallelism, but is difficult for applications whose computations change iteratively. Computation skews are inevitable when executing block-oriented iterative scientific applications in MapReduce systems. This paper proposes iPart, an autonomic workload partitioning system for taming computation skews of block-oriented iterative scientific applications in MapReduce systems. iPart introduces a workload control loop into the conventional execution of MapReduce jobs. Workload estimates in terms of execution time are collected in the reduce phase and fed back to the partition phase to update partitioning plans. Computation skews are detected and addressed by adapting partitioning to computation changes iteratively. Two adaptive partitioning methods based on the binary partitioning method are presented. Experimental evaluations with two simulated applications and the synthetic and real-world data prove that iPart responds to computation changes and adapts partitioning quickly and accurately.","PeriodicalId":288542,"journal":{"name":"2014 IEEE 7th International Conference on Cloud Computing","volume":"3 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 IEEE 7th International Conference on Cloud Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CLOUD.2014.33","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Nowadays, scientists are embracing big data techniques for exploring significant discoveries from large volumes of scientific data quickly. Properly partitioning workloads is essential for fully exploiting the benefit of parallelism, but is difficult for applications whose computations change iteratively. Computation skews are inevitable when executing block-oriented iterative scientific applications in MapReduce systems. This paper proposes iPart, an autonomic workload partitioning system for taming computation skews of block-oriented iterative scientific applications in MapReduce systems. iPart introduces a workload control loop into the conventional execution of MapReduce jobs. Workload estimates in terms of execution time are collected in the reduce phase and fed back to the partition phase to update partitioning plans. Computation skews are detected and addressed by adapting partitioning to computation changes iteratively. Two adaptive partitioning methods based on the binary partitioning method are presented. Experimental evaluations with two simulated applications and the synthetic and real-world data prove that iPart responds to computation changes and adapts partitioning quickly and accurately.