Zara Hassan , Christoph Treude , Michael Norrish , Graham Williams , Alex Potanin
{"title":"Characterising reproducibility debt in scientific software: A systematic literature review","authors":"Zara Hassan , Christoph Treude , Michael Norrish , Graham Williams , Alex Potanin","doi":"10.1016/j.jss.2024.112327","DOIUrl":null,"url":null,"abstract":"<div><h3>Context:</h3><div>In scientific software, the inability to reproduce results is often due to technical issues and challenges in recreating the full computational workflow from the original analysis. We conceptualise this problem as <em>Reproducibility Debt</em> (RpD). Much research has been performed to propose solutions to tackle these issues across various computational science disciplines. It is essential to identify and accumulate existing knowledge on reproducibility issues and state-of-the-art solutions so as to provide researchers and practitioners with information that enables further research activities and RpD management in practice.</div></div><div><h3>Objective:</h3><div>In the context of scientific software, we aim to characterise RpD by providing a taxonomy of issues contributing towards its emergence and identification (causes, effects) and the common solutions discussed in the existing literature.</div></div><div><h3>Method:</h3><div>We conducted a systematic literature review, considering 2198 studies until January 2024, including 214 primary studies.</div></div><div><h3>Results:</h3><div>We propose the first taxonomy of RpD items consisting of 37 causes attributed towards its emergence, 63 corresponding effects under seven main categories, and 29 prevention strategies. We also identify 39 specialised tools/frameworks supporting reproducibility.</div></div><div><h3>Conclusion:</h3><div>The main contributions of this work are (1) a formal definition of RpD; (2) a taxonomy of issues contributing towards RpD; (3) a list of causes and effects having implications for software professionals to identify and measure RpD in their projects; (4) a list of strategies and tools to prevent or remove RpD; (5) the identification of gaps in existing research to guide future studies.</div></div>","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":"222 ","pages":"Article 112327"},"PeriodicalIF":3.7000,"publicationDate":"2024-12-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Systems and Software","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0164121224003716","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}
引用次数: 0
Abstract
Context:
In scientific software, the inability to reproduce results is often due to technical issues and challenges in recreating the full computational workflow from the original analysis. We conceptualise this problem as Reproducibility Debt (RpD). Much research has been performed to propose solutions to tackle these issues across various computational science disciplines. It is essential to identify and accumulate existing knowledge on reproducibility issues and state-of-the-art solutions so as to provide researchers and practitioners with information that enables further research activities and RpD management in practice.
Objective:
In the context of scientific software, we aim to characterise RpD by providing a taxonomy of issues contributing towards its emergence and identification (causes, effects) and the common solutions discussed in the existing literature.
Method:
We conducted a systematic literature review, considering 2198 studies until January 2024, including 214 primary studies.
Results:
We propose the first taxonomy of RpD items consisting of 37 causes attributed towards its emergence, 63 corresponding effects under seven main categories, and 29 prevention strategies. We also identify 39 specialised tools/frameworks supporting reproducibility.
Conclusion:
The main contributions of this work are (1) a formal definition of RpD; (2) a taxonomy of issues contributing towards RpD; (3) a list of causes and effects having implications for software professionals to identify and measure RpD in their projects; (4) a list of strategies and tools to prevent or remove RpD; (5) the identification of gaps in existing research to guide future studies.
期刊介绍:
The Journal of Systems and Software publishes papers covering all aspects of software engineering and related hardware-software-systems issues. All articles should include a validation of the idea presented, e.g. through case studies, experiments, or systematic comparisons with other approaches already in practice. Topics of interest include, but are not limited to:
•Methods and tools for, and empirical studies on, software requirements, design, architecture, verification and validation, maintenance and evolution
•Agile, model-driven, service-oriented, open source and global software development
•Approaches for mobile, multiprocessing, real-time, distributed, cloud-based, dependable and virtualized systems
•Human factors and management concerns of software development
•Data management and big data issues of software systems
•Metrics and evaluation, data mining of software development resources
•Business and economic aspects of software development processes
The journal welcomes state-of-the-art surveys and reports of practical experience for all of these topics.