Evaluating Distributed Platforms for Protein-Guided Scientific Workflow

Proceedings of XSEDE16 : Diversity, Big Data, and Science at Scale : July 17-21, 2016, Intercontinental Miami Hotel, Miami, Florida, USA. Conference on Extreme Science and Engineering Discovery Environment (5th : 2016 : Miami, Fla.) Pub Date : 2014-07-13 DOI:10.1145/2616498.2616551

Natasha Pavlovikj, Kevin Begcy, S. Behera, Malachy T. Campbell, H. Walia, J. Deogun

{"title":"Evaluating Distributed Platforms for Protein-Guided Scientific Workflow","authors":"Natasha Pavlovikj, Kevin Begcy, S. Behera, Malachy T. Campbell, H. Walia, J. Deogun","doi":"10.1145/2616498.2616551","DOIUrl":null,"url":null,"abstract":"Complex and large-scale applications in different scientific disciplines are often represented as a set of independent tasks, known as workflows. Many scientific workflows have intensive resource requirements. Therefore, different distributed platforms, including campus clusters, grids and clouds are used for efficient execution of these workflows. In this paper we examine the performance and the cost of running the Pegasus Workflow Management System (Pegasus WMS) implementation of blast2cap3, the protein-guided assembly approach, on three different execution platforms: Sandhills, the University of Nebraska Campus Cluster, the academic grid Open Science Gird (OSG), and the commercial cloud Amazon EC2. Furthermore, the behavior of the blast2cap3 workflow was tested with different number of tasks. For the used workflows and execution platforms, we perform multiple runs in order to compare the total workflow running time, as well as the different resource availability over time. Additionally, for the most interesting runs, the number of running versus the number of idle jobs over time was analyzed for each platform. The performed experiments show that using the Pegasus WMS implementation of blast2cap3 with more than 100 tasks significantly reduces the running time for all execution platforms. In general, for our workflow, better performance and resource usage were achieved when Amazon EC2 was used as an execution platform. However, due to the Amazon EC2 cost, the academic distributed systems can sometimes be a good alternative and have excellent performance, especially when there are plenty of resources available.","PeriodicalId":93364,"journal":{"name":"Proceedings of XSEDE16 : Diversity, Big Data, and Science at Scale : July 17-21, 2016, Intercontinental Miami Hotel, Miami, Florida, USA. Conference on Extreme Science and Engineering Discovery Environment (5th : 2016 : Miami, Fla.)","volume":"16 1","pages":"38:1-38:8"},"PeriodicalIF":0.0000,"publicationDate":"2014-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of XSEDE16 : Diversity, Big Data, and Science at Scale : July 17-21, 2016, Intercontinental Miami Hotel, Miami, Florida, USA. Conference on Extreme Science and Engineering Discovery Environment (5th : 2016 : Miami, Fla.)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2616498.2616551","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Complex and large-scale applications in different scientific disciplines are often represented as a set of independent tasks, known as workflows. Many scientific workflows have intensive resource requirements. Therefore, different distributed platforms, including campus clusters, grids and clouds are used for efficient execution of these workflows. In this paper we examine the performance and the cost of running the Pegasus Workflow Management System (Pegasus WMS) implementation of blast2cap3, the protein-guided assembly approach, on three different execution platforms: Sandhills, the University of Nebraska Campus Cluster, the academic grid Open Science Gird (OSG), and the commercial cloud Amazon EC2. Furthermore, the behavior of the blast2cap3 workflow was tested with different number of tasks. For the used workflows and execution platforms, we perform multiple runs in order to compare the total workflow running time, as well as the different resource availability over time. Additionally, for the most interesting runs, the number of running versus the number of idle jobs over time was analyzed for each platform. The performed experiments show that using the Pegasus WMS implementation of blast2cap3 with more than 100 tasks significantly reduces the running time for all execution platforms. In general, for our workflow, better performance and resource usage were achieved when Amazon EC2 was used as an execution platform. However, due to the Amazon EC2 cost, the academic distributed systems can sometimes be a good alternative and have excellent performance, especially when there are plenty of resources available.

查看原文本刊更多论文

评估分布式平台的蛋白质指导的科学工作流程

在不同的科学学科中，复杂和大规模的应用通常被表示为一组独立的任务，称为工作流。许多科学工作流程都有密集的资源需求。因此，不同的分布式平台，包括校园集群、网格和云，被用于有效地执行这些工作流。在本文中，我们研究了在三个不同的执行平台上运行Pegasus工作流管理系统(Pegasus WMS)实现blast2cap3(蛋白质引导组装方法)的性能和成本:Sandhills，内布拉斯加州大学校园集群，学术网格开放科学网格(OSG)和商业云Amazon EC2。此外，用不同数量的任务测试了blast2cap3工作流的行为。对于使用的工作流和执行平台，我们执行多次运行，以便比较总工作流运行时间，以及随时间变化的不同资源可用性。此外，对于最有趣的运行，分析了每个平台的运行次数与空闲作业数量随时间的变化。实验表明，使用Pegasus WMS实现超过100个任务的blast2cap3可以显著减少所有执行平台的运行时间。一般来说，对于我们的工作流，当使用Amazon EC2作为执行平台时，可以实现更好的性能和资源使用。然而，由于Amazon EC2的成本，学术分布式系统有时可能是一个很好的替代方案，并且具有出色的性能，特别是在有大量可用资源的情况下。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of XSEDE16 : Diversity, Big Data, and Science at Scale : July 17-21, 2016, Intercontinental Miami Hotel, Miami, Florida, USA. Conference on Extreme Science and Engineering Discovery Environment (5th : 2016 : Miami, Fla.)

自引率

0.00%

发文量