Dynamic Reconfiguration of Data Parallel Programs

2016 28th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD) Pub Date : 2016-10-01 DOI:10.1109/SBAC-PAD.2016.32

Vinícius Dias, Wagner Meira Jr, D. Guedes

引用次数: 2

Abstract

Given the large amount of data from different sources that have become available to researchers in multiple fields, Data Science has emerged as a new paradigm for exploring and getting value from that data. In that context, new parallel processing environments with abstract programming interfaces, like Spark, were proposed to try to simplify the development of distributed programs. Although such solutions have become widely used, achieving the best performance with them is still not always straight-forward, despite the multiple run-time strategies they use. In this work we analyze some of the causes of performance degradation in such systems and, based on that analysis, we propose a tool to improve performance by dynamically adjusting data partitioning and parallelism degree in recurrent applications based on previous executions. Our results applying that methodology show consistent reductions in execution time for the applications considered, with gains of up to 50%.

查看原文本刊更多论文

数据并行程序的动态重构

鉴于来自不同来源的大量数据已可供多个领域的研究人员使用，数据科学已成为探索和从这些数据中获取价值的新范式。在这种情况下，人们提出了具有抽象编程接口的新的并行处理环境，如Spark，以试图简化分布式程序的开发。尽管这类解决方案已被广泛使用，但尽管它们使用了多种运行时策略，但使用它们实现最佳性能仍然并不总是直截了当的。在这项工作中，我们分析了此类系统中性能下降的一些原因，并基于该分析，我们提出了一种工具，通过基于先前执行的循环应用程序动态调整数据分区和并行度来提高性能。应用该方法的结果显示，所考虑的应用程序的执行时间持续减少，最多可减少50%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2016 28th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)

自引率

0.00%

发文量