Balancing Thread-Level and Task-Level Parallelism for Data-Intensive Workloads on Clusters and Clouds

Olivia Choudhury, D. Rajan, Nicholas L. Hazekamp, S. Gesing, D. Thain, S. Emrich
{"title":"Balancing Thread-Level and Task-Level Parallelism for Data-Intensive Workloads on Clusters and Clouds","authors":"Olivia Choudhury, D. Rajan, Nicholas L. Hazekamp, S. Gesing, D. Thain, S. Emrich","doi":"10.1109/CLUSTER.2015.60","DOIUrl":null,"url":null,"abstract":"The runtime configuration of parallel and distributed applications remains a mysterious art. To tune an application on a particular system, the end-user must choose the number of machines, the number of cores per task, the data partitioning strategy, and so on, all of which result in a combinatorial explosion of choices. While one might try to exhaustively evaluate all choices in search of the optimal, the end user's goal is simply to run the application once with reasonable performance by avoiding terrible configurations. To address this problem, we present a hybrid technique based on regression models for tuning data intensive bioinformatics applications: the sequential computational kernel is characterized empirically and then incorporated into an ab initio model of the distributed system. We demonstrate this technique on the commonly-used applications BWA, Bowtie2, and BLASR and validate the accuracy of our proposed models on clouds and clusters.","PeriodicalId":187042,"journal":{"name":"2015 IEEE International Conference on Cluster Computing","volume":"43 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 IEEE International Conference on Cluster Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CLUSTER.2015.60","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6

Abstract

The runtime configuration of parallel and distributed applications remains a mysterious art. To tune an application on a particular system, the end-user must choose the number of machines, the number of cores per task, the data partitioning strategy, and so on, all of which result in a combinatorial explosion of choices. While one might try to exhaustively evaluate all choices in search of the optimal, the end user's goal is simply to run the application once with reasonable performance by avoiding terrible configurations. To address this problem, we present a hybrid technique based on regression models for tuning data intensive bioinformatics applications: the sequential computational kernel is characterized empirically and then incorporated into an ab initio model of the distributed system. We demonstrate this technique on the commonly-used applications BWA, Bowtie2, and BLASR and validate the accuracy of our proposed models on clouds and clusters.
平衡集群和云上数据密集型工作负载的线程级和任务级并行性
并行和分布式应用程序的运行时配置仍然是一门神秘的艺术。要在特定系统上调优应用程序,最终用户必须选择机器数量、每个任务的核心数量、数据分区策略等等,所有这些都会导致选择的组合爆炸。虽然人们可能会尝试全面评估所有选择以寻找最优方案,但最终用户的目标只是通过避免糟糕的配置,以合理的性能运行应用程序一次。为了解决这个问题,我们提出了一种基于回归模型的混合技术,用于调优数据密集型生物信息学应用:对顺序计算核进行经验表征,然后将其纳入分布式系统的从头算模型。我们在常用的应用程序BWA、Bowtie2和BLASR上演示了该技术,并在云和集群上验证了我们提出的模型的准确性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信