平衡集群和云上数据密集型工作负载的线程级和任务级并行性

2015 IEEE International Conference on Cluster Computing Pub Date : 2015-09-08 DOI:10.1109/CLUSTER.2015.60

Olivia Choudhury, D. Rajan, Nicholas L. Hazekamp, S. Gesing, D. Thain, S. Emrich

{"title":"平衡集群和云上数据密集型工作负载的线程级和任务级并行性","authors":"Olivia Choudhury, D. Rajan, Nicholas L. Hazekamp, S. Gesing, D. Thain, S. Emrich","doi":"10.1109/CLUSTER.2015.60","DOIUrl":null,"url":null,"abstract":"The runtime configuration of parallel and distributed applications remains a mysterious art. To tune an application on a particular system, the end-user must choose the number of machines, the number of cores per task, the data partitioning strategy, and so on, all of which result in a combinatorial explosion of choices. While one might try to exhaustively evaluate all choices in search of the optimal, the end user's goal is simply to run the application once with reasonable performance by avoiding terrible configurations. To address this problem, we present a hybrid technique based on regression models for tuning data intensive bioinformatics applications: the sequential computational kernel is characterized empirically and then incorporated into an ab initio model of the distributed system. We demonstrate this technique on the commonly-used applications BWA, Bowtie2, and BLASR and validate the accuracy of our proposed models on clouds and clusters.","PeriodicalId":187042,"journal":{"name":"2015 IEEE International Conference on Cluster Computing","volume":"43 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"Balancing Thread-Level and Task-Level Parallelism for Data-Intensive Workloads on Clusters and Clouds\",\"authors\":\"Olivia Choudhury, D. Rajan, Nicholas L. Hazekamp, S. Gesing, D. Thain, S. Emrich\",\"doi\":\"10.1109/CLUSTER.2015.60\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The runtime configuration of parallel and distributed applications remains a mysterious art. To tune an application on a particular system, the end-user must choose the number of machines, the number of cores per task, the data partitioning strategy, and so on, all of which result in a combinatorial explosion of choices. While one might try to exhaustively evaluate all choices in search of the optimal, the end user's goal is simply to run the application once with reasonable performance by avoiding terrible configurations. To address this problem, we present a hybrid technique based on regression models for tuning data intensive bioinformatics applications: the sequential computational kernel is characterized empirically and then incorporated into an ab initio model of the distributed system. We demonstrate this technique on the commonly-used applications BWA, Bowtie2, and BLASR and validate the accuracy of our proposed models on clouds and clusters.\",\"PeriodicalId\":187042,\"journal\":{\"name\":\"2015 IEEE International Conference on Cluster Computing\",\"volume\":\"43 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-09-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2015 IEEE International Conference on Cluster Computing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CLUSTER.2015.60\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 IEEE International Conference on Cluster Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CLUSTER.2015.60","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 6

摘要

并行和分布式应用程序的运行时配置仍然是一门神秘的艺术。要在特定系统上调优应用程序，最终用户必须选择机器数量、每个任务的核心数量、数据分区策略等等，所有这些都会导致选择的组合爆炸。虽然人们可能会尝试全面评估所有选择以寻找最优方案，但最终用户的目标只是通过避免糟糕的配置，以合理的性能运行应用程序一次。为了解决这个问题，我们提出了一种基于回归模型的混合技术，用于调优数据密集型生物信息学应用:对顺序计算核进行经验表征，然后将其纳入分布式系统的从头算模型。我们在常用的应用程序BWA、Bowtie2和BLASR上演示了该技术，并在云和集群上验证了我们提出的模型的准确性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Balancing Thread-Level and Task-Level Parallelism for Data-Intensive Workloads on Clusters and Clouds

The runtime configuration of parallel and distributed applications remains a mysterious art. To tune an application on a particular system, the end-user must choose the number of machines, the number of cores per task, the data partitioning strategy, and so on, all of which result in a combinatorial explosion of choices. While one might try to exhaustively evaluate all choices in search of the optimal, the end user's goal is simply to run the application once with reasonable performance by avoiding terrible configurations. To address this problem, we present a hybrid technique based on regression models for tuning data intensive bioinformatics applications: the sequential computational kernel is characterized empirically and then incorporated into an ab initio model of the distributed system. We demonstrate this technique on the commonly-used applications BWA, Bowtie2, and BLASR and validate the accuracy of our proposed models on clouds and clusters.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2015 IEEE International Conference on Cluster Computing

自引率

0.00%

发文量