Scientific Tests and Continuous Integration Strategies to Enhance Reproducibility in the Scientific Software Context

Proceedings of the 2nd International Workshop on Practical Reproducible Evaluation of Computer Systems Pub Date : 2019-06-17 DOI:10.1145/3322790.3330595

M. Krafczyk, A. Shi, A. Bhaskar, D. Marinov, V. Stodden

{"title":"Scientific Tests and Continuous Integration Strategies to Enhance Reproducibility in the Scientific Software Context","authors":"M. Krafczyk, A. Shi, A. Bhaskar, D. Marinov, V. Stodden","doi":"10.1145/3322790.3330595","DOIUrl":null,"url":null,"abstract":"Continuous integration (CI) is a well-established technique in commercial and open-source software projects, although not routinely used in scientific publishing. In the scientific software context, CI can serve two functions to increase reproducibility of scientific results: providing an established platform for testing the reproducibility of these results, and demonstrating to other scientists how the code and data generate the published results. We explore scientific software testing and CI strategies using two articles published in the areas of applied mathematics and computational physics. We discuss lessons learned from reproducing these articles as well as examine and discuss existing tests. We introduce the notion of a \"scientific test\" as one that produces computational results from a published article. We then consider full result reproduction within a CI environment. If authors find their work too time or resource intensive to easily adapt to a CI context, we recommend the inclusion of results from reduced versions of their work (e.g., run at lower resolution, with shorter time scales, with smaller data sets) alongside their primary results within their article. While these smaller versions may be less interesting scientifically, they can serve to verify that published code and data are working properly. We demonstrate such reduction tests on the two articles studied.","PeriodicalId":192842,"journal":{"name":"Proceedings of the 2nd International Workshop on Practical Reproducible Evaluation of Computer Systems","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"23","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2nd International Workshop on Practical Reproducible Evaluation of Computer Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3322790.3330595","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 23

Abstract

Continuous integration (CI) is a well-established technique in commercial and open-source software projects, although not routinely used in scientific publishing. In the scientific software context, CI can serve two functions to increase reproducibility of scientific results: providing an established platform for testing the reproducibility of these results, and demonstrating to other scientists how the code and data generate the published results. We explore scientific software testing and CI strategies using two articles published in the areas of applied mathematics and computational physics. We discuss lessons learned from reproducing these articles as well as examine and discuss existing tests. We introduce the notion of a "scientific test" as one that produces computational results from a published article. We then consider full result reproduction within a CI environment. If authors find their work too time or resource intensive to easily adapt to a CI context, we recommend the inclusion of results from reduced versions of their work (e.g., run at lower resolution, with shorter time scales, with smaller data sets) alongside their primary results within their article. While these smaller versions may be less interesting scientifically, they can serve to verify that published code and data are working properly. We demonstrate such reduction tests on the two articles studied.

查看原文本刊更多论文

在科学软件环境中提高可重复性的科学测试和持续集成策略

持续集成(CI)在商业和开源软件项目中是一种成熟的技术，尽管在科学出版中并不经常使用。在科学软件上下文中，CI可以提供两个功能来增加科学结果的可重复性:为测试这些结果的可重复性提供一个已建立的平台，并向其他科学家演示代码和数据如何生成已发布的结果。我们使用在应用数学和计算物理领域发表的两篇文章来探索科学的软件测试和CI策略。我们讨论了从复制这些文章中学到的经验教训，并检查和讨论了现有的测试。我们将“科学测试”的概念介绍为从已发表的文章中产生计算结果的测试。然后我们考虑在CI环境中完整的结果复制。如果作者发现他们的工作过于耗时或资源密集，无法轻松适应CI上下文，我们建议在他们的文章中包括他们工作的简化版本的结果(例如，以更低的分辨率，更短的时间尺度，更小的数据集运行)以及他们的主要结果。虽然这些较小的版本在科学上可能不那么有趣，但它们可以用来验证发布的代码和数据是否正常工作。我们对所研究的两篇文章进行了这种约化检验。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 2nd International Workshop on Practical Reproducible Evaluation of Computer Systems

自引率

0.00%

发文量