SWEET '12最新文献

筛选
英文 中文
DAGwoman: enabling DAGMan-like workflows on non-Condor platforms DAGwoman:在非condor平台上启用类似dagman的工作流
SWEET '12 Pub Date : 2012-05-20 DOI: 10.1145/2443416.2443419
Thomas Tschager, H. Schmidt
{"title":"DAGwoman: enabling DAGMan-like workflows on non-Condor platforms","authors":"Thomas Tschager, H. Schmidt","doi":"10.1145/2443416.2443419","DOIUrl":"https://doi.org/10.1145/2443416.2443419","url":null,"abstract":"Scientific analyses have grown more and more complex. Thus, scientific workflows gained much interest and importance to automate and handle complex analyses. Tools abound to ease generation, handling and enactment of scientific workflows on distributed compute resources. Among the different workflow engines DAGMan seems to be widely available and supported by a number of tools. Unfortunately, if Condor is not installed users lack the possibility to use DAGMan. A new workflow engine, DAGwoman, is presented which can be run in user-space and allows to run DAGMan-formatted workflows. Using an artificial and two bioinformatics workflows DAGwoman is compared to GridWay's GWDAG engine and to DAGMan based on Condor-G. Showing good results with respect to workflow engine delay and features richness DAGwoman offers a complementary tool to efficiently run DAGMan-workflows if Condor is not available.","PeriodicalId":143151,"journal":{"name":"SWEET '12","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122255828","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Turbine: a distributed-memory dataflow engine for extreme-scale many-task applications Turbine:一个分布式内存数据流引擎,用于极端规模的多任务应用程序
SWEET '12 Pub Date : 2012-05-20 DOI: 10.1145/2443416.2443421
J. Wozniak, Timothy G. Armstrong, K. Maheshwari, E. Lusk, D. Katz, M. Wilde, Ian T Foster
{"title":"Turbine: a distributed-memory dataflow engine for extreme-scale many-task applications","authors":"J. Wozniak, Timothy G. Armstrong, K. Maheshwari, E. Lusk, D. Katz, M. Wilde, Ian T Foster","doi":"10.1145/2443416.2443421","DOIUrl":"https://doi.org/10.1145/2443416.2443421","url":null,"abstract":"Efficiently utilizing the rapidly increasing concurrency of multi-petaflop computing systems is a significant programming challenge. One approach is to structure applications with an upper layer of many loosely-coupled coarse-grained tasks, each comprising a tightly-coupled parallel function or program. \"Many-task\" programming models such as functional parallel dataflow may be used at the upper layer to generate massive numbers of tasks, each of which generates significant tighly-coupled parallelism at the lower level via multithreading, message passing, and/or partitioned global address spaces. At large scales, however, the management of task distribution, data dependencies, and inter-task data movement is a significant performance challenge. In this work, we describe Turbine, a new highly scalable and distributed many-task dataflow engine. Turbine executes a generalized many-task intermediate representation with automated self-distribution, and is scalable to multi-petaflop infrastructures. We present here the architecture of Turbine and its performance on highly concurrent systems.","PeriodicalId":143151,"journal":{"name":"SWEET '12","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123839755","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 32
Makeflow: a portable abstraction for data intensive computing on clusters, clouds, and grids makflow:用于集群、云和网格上的数据密集型计算的可移植抽象
SWEET '12 Pub Date : 2012-05-20 DOI: 10.1145/2443416.2443417
M. Albrecht, P. Donnelly, Peter Bui, D. Thain
{"title":"Makeflow: a portable abstraction for data intensive computing on clusters, clouds, and grids","authors":"M. Albrecht, P. Donnelly, Peter Bui, D. Thain","doi":"10.1145/2443416.2443417","DOIUrl":"https://doi.org/10.1145/2443416.2443417","url":null,"abstract":"In recent years, there has been a renewed interest in languages and systems for large scale distributed computing. Unfortunately, most systems available to the end user use a custom description language tightly coupled to a specific runtime implementation, making it difficult to transfer applications between systems. To address this problem we introduce Makeflow, a simple system for expressing and running a data-intensive workflow across multiple execution engines without requiring changes to the application or workflow description. Makeflow allows any user familiar with basic Unix Make syntax to generate a workflow and run it on one of many supported execution systems. Furthermore, in order to assess the performance characteristics of the various execution engines available to users and assist them in selecting one for use we introduce Workbench, a suite of benchmarks designed for analyzing common workflow patterns. We evaluate Workbench on two physical architectures -- the first a storage cluster with local disks and a slower network and the second a high performance computing cluster with a central parallel filesystem and fast network -- using a variety of execution engines. We conclude by demonstrating three applications that use Makeflow to execute data intensive applications consisting of thousands of jobs.","PeriodicalId":143151,"journal":{"name":"SWEET '12","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129902587","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 145
Evaluating parameter sweep workflows in high performance computing 高性能计算中参数扫描工作流的评估
SWEET '12 Pub Date : 2012-05-20 DOI: 10.1145/2443416.2443418
F. Chirigati, V. S. Sousa, Eduardo S. Ogasawara, Daniel de Oliveira, Jonas Dias, F. Porto, P. Valduriez, M. Mattoso
{"title":"Evaluating parameter sweep workflows in high performance computing","authors":"F. Chirigati, V. S. Sousa, Eduardo S. Ogasawara, Daniel de Oliveira, Jonas Dias, F. Porto, P. Valduriez, M. Mattoso","doi":"10.1145/2443416.2443418","DOIUrl":"https://doi.org/10.1145/2443416.2443418","url":null,"abstract":"Scientific experiments based on computer simulations can be defined, executed and monitored using Scientific Workflow Management Systems (SWfMS). Several SWfMS are available, each with a different goal and a different engine. Due to the exploratory analysis, scientists need to run parameter sweep (PS) workflows, which are workflows that are invoked repeatedly using different input data. These workflows generate a large amount of tasks that are submitted to High Performance Computing (HPC) environments. Different execution models for a workflow may have significant differences in performance in HPC. However, selecting the best execution model for a given workflow is difficult due to the existence of many characteristics of the workflow that may affect the parallel execution. We developed a study to show performance impacts of using different execution models in running PS workflows in HPC. Our study contributes by presenting a characterization of PS workflow patterns (the basis for many existing scientific workflows) and its behavior under different execution models in HPC. We evaluated four execution models to run workflows in parallel. Our study measures the performance behavior of small, large and complex workflows among the evaluated execution models. The results can be used as a guideline to select the best model for a given scientific workflow execution in HPC. Our evaluation may also serve as a basis for workflow designers to analyze the expected behavior of an HPC workflow engine based on the characteristics of PS workflows.","PeriodicalId":143151,"journal":{"name":"SWEET '12","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124470176","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Oozie: towards a scalable workflow management system for Hadoop Oozie:面向Hadoop的可扩展工作流管理系统
SWEET '12 Pub Date : 2012-05-20 DOI: 10.1145/2443416.2443420
Mohammad Islam, Angelo K. Huang, Mohamed Battisha, Michelle Chiang, Santhosh Srinivasan, Craig Peters, A. Neumann, A. Abdelnur
{"title":"Oozie: towards a scalable workflow management system for Hadoop","authors":"Mohammad Islam, Angelo K. Huang, Mohamed Battisha, Michelle Chiang, Santhosh Srinivasan, Craig Peters, A. Neumann, A. Abdelnur","doi":"10.1145/2443416.2443420","DOIUrl":"https://doi.org/10.1145/2443416.2443420","url":null,"abstract":"Hadoop is a massively scalable parallel computation platform capable of running hundreds of jobs concurrently, and many thousands of jobs per day. Managing all these computations demands for a workflow and scheduling system. In this paper, we identify four indispensable qualities that a Hadoop workflow management system must fulfill namely Scalability, Security, Multi-tenancy, and Operability. We find that conventional workflow management tools lack at least one of these qualities, and therefore present Apache Oozie, a workflow management system specialized for Hadoop. We discuss the architecture of Oozie, share our production experience over the last few years at Yahoo, and evaluate Oozie's scalability and performance.","PeriodicalId":143151,"journal":{"name":"SWEET '12","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132038560","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 107
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信