用于高性能和云计算的可重复科学工作流

2019 19th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID) Pub Date : 2019-05-14 DOI:10.1109/CCGRID.2019.00028

Felix Bartusch, Maximilian Hanussek, Jens Krüger, O. Kohlbacher

{"title":"用于高性能和云计算的可重复科学工作流","authors":"Felix Bartusch, Maximilian Hanussek, Jens Krüger, O. Kohlbacher","doi":"10.1109/CCGRID.2019.00028","DOIUrl":null,"url":null,"abstract":"Many complex data analysis tasks are performed by scientific workflows and pipelines deployed on high performance computing (HPC) or cloud computing resources. The complex software stack required by a workflow and unnoticed dependencies can make the deployment of a pipeline a demanding task. Once deployed, workflows tend to be black boxes, especially for users that did not create the pipeline themselves. At the end of a project a researcher should archive the pipeline in order to ensure reproducibility of published results. This paper illustrates a possible solution for each of the three tasks: reproducible deployment via software containers, automated generation of provenance information to break black boxes, and using the CiTAR service for archiving software containers.","PeriodicalId":234571,"journal":{"name":"2019 19th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)","volume":"48 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"Reproducible Scientific Workflows for High Performance and Cloud Computing\",\"authors\":\"Felix Bartusch, Maximilian Hanussek, Jens Krüger, O. Kohlbacher\",\"doi\":\"10.1109/CCGRID.2019.00028\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Many complex data analysis tasks are performed by scientific workflows and pipelines deployed on high performance computing (HPC) or cloud computing resources. The complex software stack required by a workflow and unnoticed dependencies can make the deployment of a pipeline a demanding task. Once deployed, workflows tend to be black boxes, especially for users that did not create the pipeline themselves. At the end of a project a researcher should archive the pipeline in order to ensure reproducibility of published results. This paper illustrates a possible solution for each of the three tasks: reproducible deployment via software containers, automated generation of provenance information to break black boxes, and using the CiTAR service for archiving software containers.\",\"PeriodicalId\":234571,\"journal\":{\"name\":\"2019 19th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)\",\"volume\":\"48 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-05-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 19th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CCGRID.2019.00028\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 19th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CCGRID.2019.00028","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 5

摘要

许多复杂的数据分析任务是通过部署在高性能计算(HPC)或云计算资源上的科学工作流和管道来完成的。工作流所需的复杂软件堆栈和未注意到的依赖关系可能使管道的部署成为一项艰巨的任务。一旦部署，工作流往往是黑盒，特别是对于那些没有自己创建管道的用户。在项目结束时，研究人员应该将管道存档，以确保已发表结果的可重复性。本文为这三个任务中的每一个说明了一个可能的解决方案:通过软件容器进行可重复部署，自动生成来源信息以打破黑盒，以及使用CiTAR服务存档软件容器。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Reproducible Scientific Workflows for High Performance and Cloud Computing

Many complex data analysis tasks are performed by scientific workflows and pipelines deployed on high performance computing (HPC) or cloud computing resources. The complex software stack required by a workflow and unnoticed dependencies can make the deployment of a pipeline a demanding task. Once deployed, workflows tend to be black boxes, especially for users that did not create the pipeline themselves. At the end of a project a researcher should archive the pipeline in order to ensure reproducibility of published results. This paper illustrates a possible solution for each of the three tasks: reproducible deployment via software containers, automated generation of provenance information to break black boxes, and using the CiTAR service for archiving software containers.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2019 19th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)

自引率

0.00%

发文量