评估科学工作流程的平均重复性成本

2016 IEEE 14th International Symposium on Intelligent Systems and Informatics (SISY) Pub Date : 2016-08-01 DOI:10.1109/SISY.2016.7601475

Anna Bánáti, P. Kárász, P. Kacsuk, M. Kozlovszky

{"title":"评估科学工作流程的平均重复性成本","authors":"Anna Bánáti, P. Kárász, P. Kacsuk, M. Kozlovszky","doi":"10.1109/SISY.2016.7601475","DOIUrl":null,"url":null,"abstract":"Applying scientific workflow to perform in-silico experiment is a more and more prevalent solution among the scientist's communities. Because of the data and compute intensive behavior of the scientific workflows parallel and distributed system (grids, clusters, clouds and supercomputers) are required to execute them. After all the complexity of these infrastructures and the continuously changing environment significantly encumber or even prevent the repeatability or the reproducibility which is often needed for results sharing or for judging scientific claims. The necessary data and parameters of the re-execution can be originated from different sources (infrastructural, third party, or related to the binaries), which may change or become unavailable during the years. Our ultimate goal is to compensate the lack of the original parameters by replacing, evaluating or simulating the value of the parameters in dispute. In order to create these methods we determined the levels of the re-execution and we defined a descriptor-space to collect all the parameters needed to the reproducibility. However these procedures take some extra cost the average reproducibility cost can be computed or even evaluated. In this paper we give a method to evaluate the average cost of making a workflow reproducible if the exact computation is not possible.","PeriodicalId":193153,"journal":{"name":"2016 IEEE 14th International Symposium on Intelligent Systems and Informatics (SISY)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Evaluating the average reproducibility cost of the scientific workflows\",\"authors\":\"Anna Bánáti, P. Kárász, P. Kacsuk, M. Kozlovszky\",\"doi\":\"10.1109/SISY.2016.7601475\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Applying scientific workflow to perform in-silico experiment is a more and more prevalent solution among the scientist's communities. Because of the data and compute intensive behavior of the scientific workflows parallel and distributed system (grids, clusters, clouds and supercomputers) are required to execute them. After all the complexity of these infrastructures and the continuously changing environment significantly encumber or even prevent the repeatability or the reproducibility which is often needed for results sharing or for judging scientific claims. The necessary data and parameters of the re-execution can be originated from different sources (infrastructural, third party, or related to the binaries), which may change or become unavailable during the years. Our ultimate goal is to compensate the lack of the original parameters by replacing, evaluating or simulating the value of the parameters in dispute. In order to create these methods we determined the levels of the re-execution and we defined a descriptor-space to collect all the parameters needed to the reproducibility. However these procedures take some extra cost the average reproducibility cost can be computed or even evaluated. In this paper we give a method to evaluate the average cost of making a workflow reproducible if the exact computation is not possible.\",\"PeriodicalId\":193153,\"journal\":{\"name\":\"2016 IEEE 14th International Symposium on Intelligent Systems and Informatics (SISY)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 IEEE 14th International Symposium on Intelligent Systems and Informatics (SISY)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SISY.2016.7601475\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE 14th International Symposium on Intelligent Systems and Informatics (SISY)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SISY.2016.7601475","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

摘要

应用科学的工作流程进行计算机实验是科学界越来越普遍的解决方案。由于科学工作流程的数据和计算密集型行为，并行和分布式系统(网格、集群、云和超级计算机)需要执行它们。毕竟，这些基础设施的复杂性和不断变化的环境严重阻碍甚至阻止了结果共享或判断科学主张所经常需要的可重复性或再现性。重新执行所需的数据和参数可以来自不同的来源(基础设施、第三方或与二进制文件相关的)，这些来源可能在几年中发生变化或变得不可用。我们的最终目的是通过替换、评估或模拟有争议的参数值来补偿原始参数的不足。为了创建这些方法，我们确定了重新执行的级别，并定义了一个描述符空间来收集再现性所需的所有参数。然而，这些过程需要一些额外的成本，平均可重复性成本可以计算甚至评估。在本文中，我们给出了一种方法来评估使工作流重现的平均成本，如果精确的计算是不可能的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Evaluating the average reproducibility cost of the scientific workflows

Applying scientific workflow to perform in-silico experiment is a more and more prevalent solution among the scientist's communities. Because of the data and compute intensive behavior of the scientific workflows parallel and distributed system (grids, clusters, clouds and supercomputers) are required to execute them. After all the complexity of these infrastructures and the continuously changing environment significantly encumber or even prevent the repeatability or the reproducibility which is often needed for results sharing or for judging scientific claims. The necessary data and parameters of the re-execution can be originated from different sources (infrastructural, third party, or related to the binaries), which may change or become unavailable during the years. Our ultimate goal is to compensate the lack of the original parameters by replacing, evaluating or simulating the value of the parameters in dispute. In order to create these methods we determined the levels of the re-execution and we defined a descriptor-space to collect all the parameters needed to the reproducibility. However these procedures take some extra cost the average reproducibility cost can be computed or even evaluated. In this paper we give a method to evaluate the average cost of making a workflow reproducible if the exact computation is not possible.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2016 IEEE 14th International Symposium on Intelligent Systems and Informatics (SISY)

自引率

0.00%

发文量