StragglerHelper: Alleviating Straggling in Computing Clusters via Sharing Memory Access Patterns

Wenjie Liu, Ping Huang, Xubin He
{"title":"StragglerHelper: Alleviating Straggling in Computing Clusters via Sharing Memory Access Patterns","authors":"Wenjie Liu, Ping Huang, Xubin He","doi":"10.1109/IPDPS47924.2020.00068","DOIUrl":null,"url":null,"abstract":"Clusters have been a prevalent and successful computing framework for processing large amount of data due to their distributed and parallelized working paradigm. A task submitted to a cluster is typically divided into a number of subtasks which are designated to different work nodes running the same code but dealing with different equal portion of the dataset to be processed. Due to the existence of heterogeneity, it could easily result in stragglers unfairly slowing down the entire processing, because work nodes finish their subtasks at different rates. In this study, we aim to speed up straggling work nodes to quicken the overall processing by leveraging exhibited performance variation. More specifically, we propose StragglerHelper which conveys the memory access characteristics experienced by the forerunner to the stragglers such that stragglers can be sped up due to the accurately informed memory prefetching. A Progress Monitor is deployed to supervise the respective progresses of the work nodes and inform the memory access patterns of forerunner to straggling nodes. Our evaluation results with the SPEC MPI 2007 and BigDataBench on a cluster of 64 work nodes have shown that StragglerHelper is able to improve the execution time of stragglers by up to 99.5% with an average of 61.4%, contributing to an overall improvement of the entire cohort of the cluster by up to 46.7% with an average of 9.9% compared to the baseline cluster.","PeriodicalId":6805,"journal":{"name":"2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","volume":"32 1","pages":"602-611"},"PeriodicalIF":0.0000,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPDPS47924.2020.00068","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Clusters have been a prevalent and successful computing framework for processing large amount of data due to their distributed and parallelized working paradigm. A task submitted to a cluster is typically divided into a number of subtasks which are designated to different work nodes running the same code but dealing with different equal portion of the dataset to be processed. Due to the existence of heterogeneity, it could easily result in stragglers unfairly slowing down the entire processing, because work nodes finish their subtasks at different rates. In this study, we aim to speed up straggling work nodes to quicken the overall processing by leveraging exhibited performance variation. More specifically, we propose StragglerHelper which conveys the memory access characteristics experienced by the forerunner to the stragglers such that stragglers can be sped up due to the accurately informed memory prefetching. A Progress Monitor is deployed to supervise the respective progresses of the work nodes and inform the memory access patterns of forerunner to straggling nodes. Our evaluation results with the SPEC MPI 2007 and BigDataBench on a cluster of 64 work nodes have shown that StragglerHelper is able to improve the execution time of stragglers by up to 99.5% with an average of 61.4%, contributing to an overall improvement of the entire cohort of the cluster by up to 46.7% with an average of 9.9% compared to the baseline cluster.
StragglerHelper:通过共享内存访问模式减轻计算集群中的散列
集群由于其分布式和并行的工作范式,已经成为处理大量数据的一种流行和成功的计算框架。提交给集群的任务通常被分成许多子任务,这些子任务被指定给不同的工作节点,这些工作节点运行相同的代码,但处理要处理的数据集的不同相等部分。由于异构性的存在,很容易导致离散节点不公平地拖慢整个处理速度,因为工作节点以不同的速度完成子任务。在本研究中,我们的目标是通过利用所展示的性能变化来加速分散的工作节点以加快整体处理。更具体地说,我们提出了StragglerHelper,它将先行者所经历的内存访问特征传递给掉队者,从而使掉队者可以通过准确的内存预取来加快速度。部署Progress Monitor来监督工作节点各自的进度,并将前驱节点的内存访问模式通知给落后节点。我们对SPEC MPI 2007和BigDataBench在64个工作节点的集群上的评估结果表明,与基线集群相比,StragglerHelper能够将straggler的执行时间提高99.5%,平均提高61.4%,从而使整个集群的整体性能提高46.7%,平均提高9.9%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信