Dynamic Reconfiguration of Data Parallel Programs

Vinícius Dias, Wagner Meira Jr, D. Guedes
{"title":"Dynamic Reconfiguration of Data Parallel Programs","authors":"Vinícius Dias, Wagner Meira Jr, D. Guedes","doi":"10.1109/SBAC-PAD.2016.32","DOIUrl":null,"url":null,"abstract":"Given the large amount of data from different sources that have become available to researchers in multiple fields, Data Science has emerged as a new paradigm for exploring and getting value from that data. In that context, new parallel processing environments with abstract programming interfaces, like Spark, were proposed to try to simplify the development of distributed programs. Although such solutions have become widely used, achieving the best performance with them is still not always straight-forward, despite the multiple run-time strategies they use. In this work we analyze some of the causes of performance degradation in such systems and, based on that analysis, we propose a tool to improve performance by dynamically adjusting data partitioning and parallelism degree in recurrent applications based on previous executions. Our results applying that methodology show consistent reductions in execution time for the applications considered, with gains of up to 50%.","PeriodicalId":361160,"journal":{"name":"2016 28th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)","volume":"60 20","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 28th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SBAC-PAD.2016.32","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

Given the large amount of data from different sources that have become available to researchers in multiple fields, Data Science has emerged as a new paradigm for exploring and getting value from that data. In that context, new parallel processing environments with abstract programming interfaces, like Spark, were proposed to try to simplify the development of distributed programs. Although such solutions have become widely used, achieving the best performance with them is still not always straight-forward, despite the multiple run-time strategies they use. In this work we analyze some of the causes of performance degradation in such systems and, based on that analysis, we propose a tool to improve performance by dynamically adjusting data partitioning and parallelism degree in recurrent applications based on previous executions. Our results applying that methodology show consistent reductions in execution time for the applications considered, with gains of up to 50%.
数据并行程序的动态重构
鉴于来自不同来源的大量数据已可供多个领域的研究人员使用,数据科学已成为探索和从这些数据中获取价值的新范式。在这种情况下,人们提出了具有抽象编程接口的新的并行处理环境,如Spark,以试图简化分布式程序的开发。尽管这类解决方案已被广泛使用,但尽管它们使用了多种运行时策略,但使用它们实现最佳性能仍然并不总是直截了当的。在这项工作中,我们分析了此类系统中性能下降的一些原因,并基于该分析,我们提出了一种工具,通过基于先前执行的循环应用程序动态调整数据分区和并行度来提高性能。应用该方法的结果显示,所考虑的应用程序的执行时间持续减少,最多可减少50%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信