{"title":"评估科学工作流程的平均重复性成本","authors":"Anna Bánáti, P. Kárász, P. Kacsuk, M. Kozlovszky","doi":"10.1109/SISY.2016.7601475","DOIUrl":null,"url":null,"abstract":"Applying scientific workflow to perform in-silico experiment is a more and more prevalent solution among the scientist's communities. Because of the data and compute intensive behavior of the scientific workflows parallel and distributed system (grids, clusters, clouds and supercomputers) are required to execute them. After all the complexity of these infrastructures and the continuously changing environment significantly encumber or even prevent the repeatability or the reproducibility which is often needed for results sharing or for judging scientific claims. The necessary data and parameters of the re-execution can be originated from different sources (infrastructural, third party, or related to the binaries), which may change or become unavailable during the years. Our ultimate goal is to compensate the lack of the original parameters by replacing, evaluating or simulating the value of the parameters in dispute. In order to create these methods we determined the levels of the re-execution and we defined a descriptor-space to collect all the parameters needed to the reproducibility. However these procedures take some extra cost the average reproducibility cost can be computed or even evaluated. In this paper we give a method to evaluate the average cost of making a workflow reproducible if the exact computation is not possible.","PeriodicalId":193153,"journal":{"name":"2016 IEEE 14th International Symposium on Intelligent Systems and Informatics (SISY)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Evaluating the average reproducibility cost of the scientific workflows\",\"authors\":\"Anna Bánáti, P. Kárász, P. Kacsuk, M. Kozlovszky\",\"doi\":\"10.1109/SISY.2016.7601475\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Applying scientific workflow to perform in-silico experiment is a more and more prevalent solution among the scientist's communities. Because of the data and compute intensive behavior of the scientific workflows parallel and distributed system (grids, clusters, clouds and supercomputers) are required to execute them. After all the complexity of these infrastructures and the continuously changing environment significantly encumber or even prevent the repeatability or the reproducibility which is often needed for results sharing or for judging scientific claims. The necessary data and parameters of the re-execution can be originated from different sources (infrastructural, third party, or related to the binaries), which may change or become unavailable during the years. Our ultimate goal is to compensate the lack of the original parameters by replacing, evaluating or simulating the value of the parameters in dispute. In order to create these methods we determined the levels of the re-execution and we defined a descriptor-space to collect all the parameters needed to the reproducibility. However these procedures take some extra cost the average reproducibility cost can be computed or even evaluated. In this paper we give a method to evaluate the average cost of making a workflow reproducible if the exact computation is not possible.\",\"PeriodicalId\":193153,\"journal\":{\"name\":\"2016 IEEE 14th International Symposium on Intelligent Systems and Informatics (SISY)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 IEEE 14th International Symposium on Intelligent Systems and Informatics (SISY)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SISY.2016.7601475\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE 14th International Symposium on Intelligent Systems and Informatics (SISY)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SISY.2016.7601475","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Evaluating the average reproducibility cost of the scientific workflows
Applying scientific workflow to perform in-silico experiment is a more and more prevalent solution among the scientist's communities. Because of the data and compute intensive behavior of the scientific workflows parallel and distributed system (grids, clusters, clouds and supercomputers) are required to execute them. After all the complexity of these infrastructures and the continuously changing environment significantly encumber or even prevent the repeatability or the reproducibility which is often needed for results sharing or for judging scientific claims. The necessary data and parameters of the re-execution can be originated from different sources (infrastructural, third party, or related to the binaries), which may change or become unavailable during the years. Our ultimate goal is to compensate the lack of the original parameters by replacing, evaluating or simulating the value of the parameters in dispute. In order to create these methods we determined the levels of the re-execution and we defined a descriptor-space to collect all the parameters needed to the reproducibility. However these procedures take some extra cost the average reproducibility cost can be computed or even evaluated. In this paper we give a method to evaluate the average cost of making a workflow reproducible if the exact computation is not possible.