J. Mena, O. Shaaban, Víctor López, Marta Garcia, P. Carpenter, E. Ayguadé, Jesús Labarta
{"title":"Transparent load balancing of MPI programs using OmpSs-2@Cluster and DLB","authors":"J. Mena, O. Shaaban, Víctor López, Marta Garcia, P. Carpenter, E. Ayguadé, Jesús Labarta","doi":"10.1145/3545008.3545045","DOIUrl":null,"url":null,"abstract":"Abstract Load imbalance is a long-standing source of inefficiency in high performance computing. The situation has only got worse as applications and systems increase in complexity, e.g., adaptive mesh refinement, DVFS, memory hierarchies, power and thermal management, and manufacturing processes. Load balancing is often implemented in the application, but it obscures application logic and may need extensive code refactoring. This paper presents an automated and transparent dynamic load balancing approach for MPI applications with OmpSs-2 tasks, which relieves applications from this burden. Only local and trivial changes are required to the application. Our approach exploits the ability of OmpSs-2@Cluster to offload tasks for execution on other nodes, and it reallocates compute resources among ranks using the Dynamic Load Balancing (DLB) library. It employs LeWI to react to fine-grained load imbalances and DROM to address coarse-grained load imbalances by reserving cores on other nodes that can be reclaimed on demand. We use an expander graph to limit the amount of point-to-point communication and state. The results show 46% reduction in time-to-solution for micro-scale solid mechanics on 32 nodes and a 20% reduction beyond DLB for n-body on 16 nodes, when one node is running slow. A synthetic benchmark shows that performance is within 10% of optimal for an imbalance of up to 2.0 on 8 nodes. All software is released open source.","PeriodicalId":360504,"journal":{"name":"Proceedings of the 51st International Conference on Parallel Processing","volume":"56 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 51st International Conference on Parallel Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3545008.3545045","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Abstract Load imbalance is a long-standing source of inefficiency in high performance computing. The situation has only got worse as applications and systems increase in complexity, e.g., adaptive mesh refinement, DVFS, memory hierarchies, power and thermal management, and manufacturing processes. Load balancing is often implemented in the application, but it obscures application logic and may need extensive code refactoring. This paper presents an automated and transparent dynamic load balancing approach for MPI applications with OmpSs-2 tasks, which relieves applications from this burden. Only local and trivial changes are required to the application. Our approach exploits the ability of OmpSs-2@Cluster to offload tasks for execution on other nodes, and it reallocates compute resources among ranks using the Dynamic Load Balancing (DLB) library. It employs LeWI to react to fine-grained load imbalances and DROM to address coarse-grained load imbalances by reserving cores on other nodes that can be reclaimed on demand. We use an expander graph to limit the amount of point-to-point communication and state. The results show 46% reduction in time-to-solution for micro-scale solid mechanics on 32 nodes and a 20% reduction beyond DLB for n-body on 16 nodes, when one node is running slow. A synthetic benchmark shows that performance is within 10% of optimal for an imbalance of up to 2.0 on 8 nodes. All software is released open source.