Vinita V. Deodhar, Hrushit Parikh, Ada Gavrilovska, S. Pande
{"title":"大型集群上的编译器辅助负载平衡","authors":"Vinita V. Deodhar, Hrushit Parikh, Ada Gavrilovska, S. Pande","doi":"10.1109/PACT.2015.40","DOIUrl":null,"url":null,"abstract":"Load balancing of tasks across processing nodes is critical for achieving speed up on large scale clusters. Load balancing schemes typically detect the imbalance and then migrate the load from an overloaded processing node to an idle or lightly loaded processing node and thus, estimation of load critically affects the performance of load balancing schemes. On large scale clusters, the latency of load migration between processing nodes (and the energy) is also a significant overhead and any missteps in load estimation can cause significant migrations and performance losses. Currently, the load estimation is done either by profile or feedback driven approaches in sophisticated systems such as Charm++, but such approaches must be re-thought in light of some workloads such as Adaptive Mesh Refinement (AMR) and multiscale physics where the load variations could be quite dynamic and rapid. In this work we propose a compiler based framework which performs precise prediction of the forthcoming workload. The compiler driven load prediction technique performs static analysis of a task and derives an expression to predict load of a task and hoists it as early as possible in the control flow of execution. The compiler also inserts corrector expressions at strategic program points which refine the reachability probability of the load as well as its estimation. The predictor and the corrector expressions are evaluated at runtime and the predicted load information is refined as the execution proceeds and is eventually used by load balancer to take efficient migration decisions. We present an implementation of the above in the Rose compiler and the Charm++ parallel programming framework. We demonstrate the effectiveness of the framework on some key benchmarks that exhibit dynamic variations and show how the compiler framework assists load balancing schemes in Charm++ to provide significant gains.","PeriodicalId":385398,"journal":{"name":"2015 International Conference on Parallel Architecture and Compilation (PACT)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2015-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"Compiler Assisted Load Balancing on Large Clusters\",\"authors\":\"Vinita V. Deodhar, Hrushit Parikh, Ada Gavrilovska, S. Pande\",\"doi\":\"10.1109/PACT.2015.40\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Load balancing of tasks across processing nodes is critical for achieving speed up on large scale clusters. Load balancing schemes typically detect the imbalance and then migrate the load from an overloaded processing node to an idle or lightly loaded processing node and thus, estimation of load critically affects the performance of load balancing schemes. On large scale clusters, the latency of load migration between processing nodes (and the energy) is also a significant overhead and any missteps in load estimation can cause significant migrations and performance losses. Currently, the load estimation is done either by profile or feedback driven approaches in sophisticated systems such as Charm++, but such approaches must be re-thought in light of some workloads such as Adaptive Mesh Refinement (AMR) and multiscale physics where the load variations could be quite dynamic and rapid. In this work we propose a compiler based framework which performs precise prediction of the forthcoming workload. The compiler driven load prediction technique performs static analysis of a task and derives an expression to predict load of a task and hoists it as early as possible in the control flow of execution. The compiler also inserts corrector expressions at strategic program points which refine the reachability probability of the load as well as its estimation. The predictor and the corrector expressions are evaluated at runtime and the predicted load information is refined as the execution proceeds and is eventually used by load balancer to take efficient migration decisions. We present an implementation of the above in the Rose compiler and the Charm++ parallel programming framework. We demonstrate the effectiveness of the framework on some key benchmarks that exhibit dynamic variations and show how the compiler framework assists load balancing schemes in Charm++ to provide significant gains.\",\"PeriodicalId\":385398,\"journal\":{\"name\":\"2015 International Conference on Parallel Architecture and Compilation (PACT)\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-10-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2015 International Conference on Parallel Architecture and Compilation (PACT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/PACT.2015.40\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 International Conference on Parallel Architecture and Compilation (PACT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PACT.2015.40","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Compiler Assisted Load Balancing on Large Clusters
Load balancing of tasks across processing nodes is critical for achieving speed up on large scale clusters. Load balancing schemes typically detect the imbalance and then migrate the load from an overloaded processing node to an idle or lightly loaded processing node and thus, estimation of load critically affects the performance of load balancing schemes. On large scale clusters, the latency of load migration between processing nodes (and the energy) is also a significant overhead and any missteps in load estimation can cause significant migrations and performance losses. Currently, the load estimation is done either by profile or feedback driven approaches in sophisticated systems such as Charm++, but such approaches must be re-thought in light of some workloads such as Adaptive Mesh Refinement (AMR) and multiscale physics where the load variations could be quite dynamic and rapid. In this work we propose a compiler based framework which performs precise prediction of the forthcoming workload. The compiler driven load prediction technique performs static analysis of a task and derives an expression to predict load of a task and hoists it as early as possible in the control flow of execution. The compiler also inserts corrector expressions at strategic program points which refine the reachability probability of the load as well as its estimation. The predictor and the corrector expressions are evaluated at runtime and the predicted load information is refined as the execution proceeds and is eventually used by load balancer to take efficient migration decisions. We present an implementation of the above in the Rose compiler and the Charm++ parallel programming framework. We demonstrate the effectiveness of the framework on some key benchmarks that exhibit dynamic variations and show how the compiler framework assists load balancing schemes in Charm++ to provide significant gains.