使用UPC的可扩展动态负载平衡

Stephen L. Olivier, J. Prins
{"title":"使用UPC的可扩展动态负载平衡","authors":"Stephen L. Olivier, J. Prins","doi":"10.1109/ICPP.2008.19","DOIUrl":null,"url":null,"abstract":"An asynchronous work-stealing implementation of dynamic load balance is implemented using Unified Parallel C (UPC) and evaluated using the Unbalanced Tree Search (UTS) benchmark [Olivier, S., et al., 2007]. The UTS benchmark presents a synthetic tree-structured search space that is highly imbalanced. Parallel implementation of the search requires continuous dynamic load balancing to keep all processors engaged in the search. Our implementation achieves better scaling and parallel efficiency in both shared memory and distributed memory settings than previous efforts using UPC [Olivier, S., et al., 2007] and MPI [Dinan, J., et al., 2007]. We observe parallel efficiency of 80% using 1024 processors performing over 85,000 total load balancing operations per second continuously. The UPC programming model provides substantial simplifications in the expression of the asynchronous work stealing protocol compared with MPI. However, to obtain performance portability with UPC in both shared memory and distributed memory settings requires the careful use of one sided reads and writes to minimize the impact of high latency communication. Additional protocol improvements are made to improve dissemination of available work and to decrease the cost of termination detection.","PeriodicalId":388408,"journal":{"name":"2008 37th International Conference on Parallel Processing","volume":"5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2008-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"69","resultStr":"{\"title\":\"Scalable Dynamic Load Balancing Using UPC\",\"authors\":\"Stephen L. Olivier, J. Prins\",\"doi\":\"10.1109/ICPP.2008.19\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"An asynchronous work-stealing implementation of dynamic load balance is implemented using Unified Parallel C (UPC) and evaluated using the Unbalanced Tree Search (UTS) benchmark [Olivier, S., et al., 2007]. The UTS benchmark presents a synthetic tree-structured search space that is highly imbalanced. Parallel implementation of the search requires continuous dynamic load balancing to keep all processors engaged in the search. Our implementation achieves better scaling and parallel efficiency in both shared memory and distributed memory settings than previous efforts using UPC [Olivier, S., et al., 2007] and MPI [Dinan, J., et al., 2007]. We observe parallel efficiency of 80% using 1024 processors performing over 85,000 total load balancing operations per second continuously. The UPC programming model provides substantial simplifications in the expression of the asynchronous work stealing protocol compared with MPI. However, to obtain performance portability with UPC in both shared memory and distributed memory settings requires the careful use of one sided reads and writes to minimize the impact of high latency communication. Additional protocol improvements are made to improve dissemination of available work and to decrease the cost of termination detection.\",\"PeriodicalId\":388408,\"journal\":{\"name\":\"2008 37th International Conference on Parallel Processing\",\"volume\":\"5 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2008-09-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"69\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2008 37th International Conference on Parallel Processing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICPP.2008.19\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2008 37th International Conference on Parallel Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICPP.2008.19","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 69

摘要

使用统一并行C (UPC)实现了动态负载平衡的异步工作窃取实现,并使用不平衡树搜索(UTS)基准进行了评估[Olivier, S.等,2007]。UTS基准给出了一个高度不平衡的合成树结构搜索空间。搜索的并行实现需要持续的动态负载平衡,以保持所有处理器都参与搜索。与之前使用UPC [Olivier, S., et ., 2007]和MPI [Dinan, J., et ., 2007]相比,我们的实现在共享内存和分布式内存设置中实现了更好的可伸缩性和并行效率。我们观察到,使用1024个处理器每秒连续执行超过85,000个总负载平衡操作,并行效率达到80%。与MPI相比,UPC编程模型大大简化了异步工作窃取协议的表达式。然而,为了在共享内存和分布式内存设置中获得UPC的性能可移植性,需要谨慎使用单侧读写,以最大限度地减少高延迟通信的影响。另外还对协议进行了改进,以改进现有工作的传播,并降低终止检测的成本。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Scalable Dynamic Load Balancing Using UPC
An asynchronous work-stealing implementation of dynamic load balance is implemented using Unified Parallel C (UPC) and evaluated using the Unbalanced Tree Search (UTS) benchmark [Olivier, S., et al., 2007]. The UTS benchmark presents a synthetic tree-structured search space that is highly imbalanced. Parallel implementation of the search requires continuous dynamic load balancing to keep all processors engaged in the search. Our implementation achieves better scaling and parallel efficiency in both shared memory and distributed memory settings than previous efforts using UPC [Olivier, S., et al., 2007] and MPI [Dinan, J., et al., 2007]. We observe parallel efficiency of 80% using 1024 processors performing over 85,000 total load balancing operations per second continuously. The UPC programming model provides substantial simplifications in the expression of the asynchronous work stealing protocol compared with MPI. However, to obtain performance portability with UPC in both shared memory and distributed memory settings requires the careful use of one sided reads and writes to minimize the impact of high latency communication. Additional protocol improvements are made to improve dissemination of available work and to decrease the cost of termination detection.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信