Adaptive Runtime Features for Distributed Graph Algorithms

2018 IEEE 25th International Conference on High Performance Computing (HiPC) Pub Date : 2018-12-01 DOI:10.1109/HiPC.2018.00018

J. Firoz, Marcin Zalewski, Joshua D. Suetterlein, A. Lumsdaine

{"title":"Adaptive Runtime Features for Distributed Graph Algorithms","authors":"J. Firoz, Marcin Zalewski, Joshua D. Suetterlein, A. Lumsdaine","doi":"10.1109/HiPC.2018.00018","DOIUrl":null,"url":null,"abstract":"Performance of distributed graph algorithms can benefit greatly by forming rapport between algorithmic abstraction and the underlying runtime system that is responsible for scheduling work and exchanging messages. However, due to their dynamic and irregular nature of computation, distributed graph algorithms written in different programming models impose varying degrees of workload pressure on the runtime. To cope with such vastly different workload characteristics, a runtime has to make several trade-offs. One such trade-off arises, for example, when the runtime scheduler has to choose among alternatives such as whether to execute algorithmic work, or progress the network by probing network buffers, or throttle sending messages (termed flow control). This trade-off decides between optimizing the throughput of a runtime scheduler by increasing the rate of execution of algorithmic work, and reducing the latency of the network messages. Another trade-off exists when a decision has to be made about when to send aggregated messages in buffers (message coalescing). This decision chooses between trading off latency for network bandwidth and vice versa. At any instant, such trade-offs emphasize either on improving the quantity of work being executed (by maximizing the scheduler throughput) or on improving the quality of work (by prioritizing better work). However, encoding static policies for different runtime features (such as flow control, coalescing) can prevent graph algorithms from achieving their full potentials, thus can under-mine the actual performance of a distributed graph algorithm . In this paper, we investigate runtime support for distributed graph algorithms in the context of two paradigms: variants of well-known Bulk-Synchronous Parallel model and asynchronous programming model. We explore generic runtime features such as message coalescing (aggregation) and flow control and show that execution policies of these features need to be adjusted over time to make a positive impact on the execution time of a distributed graph algorithm. Since synchronous and asynchronous graph algorithms have different workload characteristics, not all of such runtime features may be good candidates for adaptation. Each of these algorithmic paradigms may require different set of features to be adapted over time. We demonstrate which set of feature(s) can be useful in each case to achieve the right balance of work in the runtime layer. Existing implementation of different graph algorithms can benefit from adapting dynamic policies in the underlying runtime.","PeriodicalId":113335,"journal":{"name":"2018 IEEE 25th International Conference on High Performance Computing (HiPC)","volume":"52 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE 25th International Conference on High Performance Computing (HiPC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HiPC.2018.00018","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Performance of distributed graph algorithms can benefit greatly by forming rapport between algorithmic abstraction and the underlying runtime system that is responsible for scheduling work and exchanging messages. However, due to their dynamic and irregular nature of computation, distributed graph algorithms written in different programming models impose varying degrees of workload pressure on the runtime. To cope with such vastly different workload characteristics, a runtime has to make several trade-offs. One such trade-off arises, for example, when the runtime scheduler has to choose among alternatives such as whether to execute algorithmic work, or progress the network by probing network buffers, or throttle sending messages (termed flow control). This trade-off decides between optimizing the throughput of a runtime scheduler by increasing the rate of execution of algorithmic work, and reducing the latency of the network messages. Another trade-off exists when a decision has to be made about when to send aggregated messages in buffers (message coalescing). This decision chooses between trading off latency for network bandwidth and vice versa. At any instant, such trade-offs emphasize either on improving the quantity of work being executed (by maximizing the scheduler throughput) or on improving the quality of work (by prioritizing better work). However, encoding static policies for different runtime features (such as flow control, coalescing) can prevent graph algorithms from achieving their full potentials, thus can under-mine the actual performance of a distributed graph algorithm . In this paper, we investigate runtime support for distributed graph algorithms in the context of two paradigms: variants of well-known Bulk-Synchronous Parallel model and asynchronous programming model. We explore generic runtime features such as message coalescing (aggregation) and flow control and show that execution policies of these features need to be adjusted over time to make a positive impact on the execution time of a distributed graph algorithm. Since synchronous and asynchronous graph algorithms have different workload characteristics, not all of such runtime features may be good candidates for adaptation. Each of these algorithmic paradigms may require different set of features to be adapted over time. We demonstrate which set of feature(s) can be useful in each case to achieve the right balance of work in the runtime layer. Existing implementation of different graph algorithms can benefit from adapting dynamic policies in the underlying runtime.

查看原文本刊更多论文

分布式图算法的自适应运行时特性

通过在算法抽象和负责调度工作和交换消息的底层运行时系统之间建立关系，分布式图算法的性能将大大提高。然而，由于其计算的动态性和不规则性，使用不同编程模型编写的分布式图算法对运行时施加了不同程度的工作负载压力。为了应对如此差异巨大的工作负载特征，运行时必须做出一些权衡。例如，当运行时调度器必须在备选方案中做出选择时，就会出现这样的权衡，例如是执行算法工作，还是通过探测网络缓冲区来推进网络，还是限制发送消息(称为流量控制)。这种权衡是通过提高算法工作的执行速度来优化运行时调度器的吞吐量，还是减少网络消息的延迟。当必须决定何时在缓冲区中发送聚合消息(消息合并)时，存在另一种权衡。这个决定在网络带宽和延迟之间做出选择。在任何时候，这样的权衡要么强调提高正在执行的工作数量(通过最大化调度器吞吐量)，要么强调提高工作质量(通过优先处理更好的工作)。然而，为不同的运行时特性(如流量控制、合并)编码静态策略可能会阻止图算法实现其全部潜力，从而破坏分布式图算法的实际性能。在本文中，我们研究了分布式图算法在两种范式下的运行时支持:众所周知的批量同步并行模型和异步编程模型的变体。我们探讨了通用的运行时特性，如消息合并(聚合)和流控制，并表明这些特性的执行策略需要随着时间的推移而调整，以对分布式图算法的执行时间产生积极的影响。由于同步和异步图算法具有不同的工作负载特征，因此并不是所有这些运行时特性都适合自适应。这些算法范例中的每一个都可能需要不同的特性集来适应。我们将演示在每种情况下哪一组特性对实现运行时层工作的适当平衡是有用的。不同图算法的现有实现可以从底层运行时的动态策略调整中获益。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2018 IEEE 25th International Conference on High Performance Computing (HiPC)

自引率

0.00%

发文量