集成运行时方法的多级负载平衡

Seonmyeong Bak, Harshitha Menon, Sam White, M. Diener, L. Kalé
{"title":"集成运行时方法的多级负载平衡","authors":"Seonmyeong Bak, Harshitha Menon, Sam White, M. Diener, L. Kalé","doi":"10.1109/CCGRID.2018.00018","DOIUrl":null,"url":null,"abstract":"The recent trend of increasing numbers of cores per chip has resulted in vast amounts of on-node parallelism. These high core counts result in hardware variability that introduces imbalance. Applications are also becoming more complex, re-sulting in dynamic load imbalance. Load imbalance of any kind can result in loss of performance and system utilization. We address the challenge of handling both transient and persistent load imbalances while maintaining locality with low overhead. In this paper, we propose an integrated runtime system that combines the Charm++ distributed programming model with concurrent tasks to mitigate load imbalances within and across shared memory address spaces. It utilizes a periodic assignment of work to cores based on load measurement, in combination with user created tasks to handle load imbalance. We integrate OpenMP with Charm++ to enable creation of potential tasks via OpenMP's parallel loop construct. This is also available to MPI applications through the Adaptive MPI implementation. We demonstrate the benefits of our work on three applications. We show improvements of Lassen by 29.6% on Cori and 46.5% on Theta. We also demonstrate the benefits on a Charm++ application, ChaNGa by 25.7% on Theta, as well as an MPI proxy application, Kripke, using Adaptive MPI.","PeriodicalId":321027,"journal":{"name":"2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"17","resultStr":"{\"title\":\"Multi-Level Load Balancing with an Integrated Runtime Approach\",\"authors\":\"Seonmyeong Bak, Harshitha Menon, Sam White, M. Diener, L. Kalé\",\"doi\":\"10.1109/CCGRID.2018.00018\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The recent trend of increasing numbers of cores per chip has resulted in vast amounts of on-node parallelism. These high core counts result in hardware variability that introduces imbalance. Applications are also becoming more complex, re-sulting in dynamic load imbalance. Load imbalance of any kind can result in loss of performance and system utilization. We address the challenge of handling both transient and persistent load imbalances while maintaining locality with low overhead. In this paper, we propose an integrated runtime system that combines the Charm++ distributed programming model with concurrent tasks to mitigate load imbalances within and across shared memory address spaces. It utilizes a periodic assignment of work to cores based on load measurement, in combination with user created tasks to handle load imbalance. We integrate OpenMP with Charm++ to enable creation of potential tasks via OpenMP's parallel loop construct. This is also available to MPI applications through the Adaptive MPI implementation. We demonstrate the benefits of our work on three applications. We show improvements of Lassen by 29.6% on Cori and 46.5% on Theta. We also demonstrate the benefits on a Charm++ application, ChaNGa by 25.7% on Theta, as well as an MPI proxy application, Kripke, using Adaptive MPI.\",\"PeriodicalId\":321027,\"journal\":{\"name\":\"2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)\",\"volume\":\"3 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"17\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CCGRID.2018.00018\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CCGRID.2018.00018","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 17

摘要

最近每个芯片的核心数量不断增加的趋势导致了大量的节点上并行性。这些高核数导致硬件可变性,从而导致不平衡。应用程序也变得越来越复杂,导致动态负载不平衡。任何类型的负载不平衡都可能导致性能和系统利用率的损失。我们解决了处理瞬时和持久负载不平衡的挑战,同时保持低开销的局部性。在本文中,我们提出了一个集成的运行时系统,该系统将Charm++分布式编程模型与并发任务相结合,以减轻共享内存地址空间内和跨地址空间的负载不平衡。它利用基于负载测量的周期性工作分配给核心,并结合用户创建的任务来处理负载不平衡。我们将OpenMP与Charm++集成在一起,通过OpenMP的并行循环结构创建潜在任务。这也可以通过自适应MPI实现提供给MPI应用程序。我们将在三个应用程序上演示我们的工作带来的好处。我们显示Lassen在Cori和Theta上分别提高了29.6%和46.5%。我们还演示了在Theta上提高25.7%的Charm++应用程序ChaNGa以及使用自适应MPI的MPI代理应用程序Kripke的好处。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Multi-Level Load Balancing with an Integrated Runtime Approach
The recent trend of increasing numbers of cores per chip has resulted in vast amounts of on-node parallelism. These high core counts result in hardware variability that introduces imbalance. Applications are also becoming more complex, re-sulting in dynamic load imbalance. Load imbalance of any kind can result in loss of performance and system utilization. We address the challenge of handling both transient and persistent load imbalances while maintaining locality with low overhead. In this paper, we propose an integrated runtime system that combines the Charm++ distributed programming model with concurrent tasks to mitigate load imbalances within and across shared memory address spaces. It utilizes a periodic assignment of work to cores based on load measurement, in combination with user created tasks to handle load imbalance. We integrate OpenMP with Charm++ to enable creation of potential tasks via OpenMP's parallel loop construct. This is also available to MPI applications through the Adaptive MPI implementation. We demonstrate the benefits of our work on three applications. We show improvements of Lassen by 29.6% on Cori and 46.5% on Theta. We also demonstrate the benefits on a Charm++ application, ChaNGa by 25.7% on Theta, as well as an MPI proxy application, Kripke, using Adaptive MPI.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信