用于迭代过度分解应用程序的工作窃取和基于持久性的负载平衡器

IEEE International Symposium on High-Performance Parallel Distributed Computing Pub Date : 2012-06-18 DOI:10.1145/2287076.2287103

J. Lifflander, S. Krishnamoorthy, L. Kalé

{"title":"用于迭代过度分解应用程序的工作窃取和基于持久性的负载平衡器","authors":"J. Lifflander, S. Krishnamoorthy, L. Kalé","doi":"10.1145/2287076.2287103","DOIUrl":null,"url":null,"abstract":"Applications often involve iterative execution of identical or slowly evolving calculations. Such applications require incremental rebalancing to improve load balance across iterations. In this paper, we consider the design and evaluation of two distinct approaches to addressing this challenge: persistence-based load balancing and work stealing. The work to be performed is overdecomposed into tasks, enabling automatic rebalancing by the middleware. We present a hierarchical persistence-based rebalancing algorithm that performs localized incremental rebalancing. We also present an active-message-based retentive work stealing algorithm optimized for iterative applications on distributed memory machines. We demonstrate low overheads and high efficiencies on the full NERSC Hopper (146,400 cores) and ALCF Intrepid systems (163,840 cores), and on up to 128,000 cores on OLCF Titan.","PeriodicalId":330072,"journal":{"name":"IEEE International Symposium on High-Performance Parallel Distributed Computing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"69","resultStr":"{\"title\":\"Work stealing and persistence-based load balancers for iterative overdecomposed applications\",\"authors\":\"J. Lifflander, S. Krishnamoorthy, L. Kalé\",\"doi\":\"10.1145/2287076.2287103\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Applications often involve iterative execution of identical or slowly evolving calculations. Such applications require incremental rebalancing to improve load balance across iterations. In this paper, we consider the design and evaluation of two distinct approaches to addressing this challenge: persistence-based load balancing and work stealing. The work to be performed is overdecomposed into tasks, enabling automatic rebalancing by the middleware. We present a hierarchical persistence-based rebalancing algorithm that performs localized incremental rebalancing. We also present an active-message-based retentive work stealing algorithm optimized for iterative applications on distributed memory machines. We demonstrate low overheads and high efficiencies on the full NERSC Hopper (146,400 cores) and ALCF Intrepid systems (163,840 cores), and on up to 128,000 cores on OLCF Titan.\",\"PeriodicalId\":330072,\"journal\":{\"name\":\"IEEE International Symposium on High-Performance Parallel Distributed Computing\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2012-06-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"69\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE International Symposium on High-Performance Parallel Distributed Computing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2287076.2287103\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE International Symposium on High-Performance Parallel Distributed Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2287076.2287103","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 69

摘要

应用程序通常涉及相同的或缓慢发展的计算的迭代执行。这样的应用程序需要增量的再平衡来改善跨迭代的负载平衡。在本文中，我们考虑设计和评估两种不同的方法来解决这一挑战:基于持久性的负载平衡和工作窃取。要执行的工作被过度分解为任务，从而允许中间件自动重新平衡。我们提出了一种分层的基于持久性的再平衡算法，该算法执行局部增量再平衡。我们还提出了一种基于活动消息的保留工作窃取算法，该算法针对分布式内存机器上的迭代应用进行了优化。我们在全NERSC Hopper(146,400个内核)和ALCF Intrepid系统(163,840个内核)以及OLCF Titan上高达128,000个内核上展示了低开销和高效率。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Work stealing and persistence-based load balancers for iterative overdecomposed applications

Applications often involve iterative execution of identical or slowly evolving calculations. Such applications require incremental rebalancing to improve load balance across iterations. In this paper, we consider the design and evaluation of two distinct approaches to addressing this challenge: persistence-based load balancing and work stealing. The work to be performed is overdecomposed into tasks, enabling automatic rebalancing by the middleware. We present a hierarchical persistence-based rebalancing algorithm that performs localized incremental rebalancing. We also present an active-message-based retentive work stealing algorithm optimized for iterative applications on distributed memory machines. We demonstrate low overheads and high efficiencies on the full NERSC Hopper (146,400 cores) and ALCF Intrepid systems (163,840 cores), and on up to 128,000 cores on OLCF Titan.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE International Symposium on High-Performance Parallel Distributed Computing

自引率

0.00%

发文量