交互式和可重复分析的容器化研究综述

Gregory J. Hunt, Johann A. Gagnon-Bartsch
{"title":"交互式和可重复分析的容器化研究综述","authors":"Gregory J. Hunt, Johann A. Gagnon-Bartsch","doi":"10.52933/jdssv.v3i1.53","DOIUrl":null,"url":null,"abstract":"In recent decades the analysis of data has become increasingly computational. Correspondingly, this has changed how scientific and statistical work is shared. For example, it is now commonplace for underlying analysis code and data to be proffered alongside journal publications and conference talks. Unfortunately, sharing code faces several challenges. First, it is often difficult to take code from one computer and run it on another. Code configuration, version, and dependency issues often make this challenging. Secondly, even if the code runs, it is often hard to understand or interact with the analysis. This makes it difficult to assess the code and its findings, for example, in a peer review process. In this review we describe the combination of two computing technologies that help make analyses shareable, interactive, and completely reproducible. These technologies are (1) analysis containerization, which leverages virtualization to fully encapsulate analysis, data, code and dependencies into an interactive and shareable format, and (2) code notebooks, a literate programming format for interacting with analyses. The fusion of these two technologies offers significant advantages over using either individually. This review surveys how the combination enhances the accessibility and reproducibility of code, analyses, and ideas.","PeriodicalId":93459,"journal":{"name":"Journal of data science, statistics, and visualisation","volume":"16 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2021-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Review of Containerization for Interactive and Reproducible Analysis\",\"authors\":\"Gregory J. Hunt, Johann A. Gagnon-Bartsch\",\"doi\":\"10.52933/jdssv.v3i1.53\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In recent decades the analysis of data has become increasingly computational. Correspondingly, this has changed how scientific and statistical work is shared. For example, it is now commonplace for underlying analysis code and data to be proffered alongside journal publications and conference talks. Unfortunately, sharing code faces several challenges. First, it is often difficult to take code from one computer and run it on another. Code configuration, version, and dependency issues often make this challenging. Secondly, even if the code runs, it is often hard to understand or interact with the analysis. This makes it difficult to assess the code and its findings, for example, in a peer review process. In this review we describe the combination of two computing technologies that help make analyses shareable, interactive, and completely reproducible. These technologies are (1) analysis containerization, which leverages virtualization to fully encapsulate analysis, data, code and dependencies into an interactive and shareable format, and (2) code notebooks, a literate programming format for interacting with analyses. The fusion of these two technologies offers significant advantages over using either individually. This review surveys how the combination enhances the accessibility and reproducibility of code, analyses, and ideas.\",\"PeriodicalId\":93459,\"journal\":{\"name\":\"Journal of data science, statistics, and visualisation\",\"volume\":\"16 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-03-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of data science, statistics, and visualisation\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.52933/jdssv.v3i1.53\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of data science, statistics, and visualisation","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.52933/jdssv.v3i1.53","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

近几十年来,数据分析越来越具有计算性。相应地,这也改变了科学和统计工作的共享方式。例如,现在将基础分析代码和数据与期刊出版物和会议演讲一起提供是很常见的。不幸的是,共享代码面临着几个挑战。首先,通常很难从一台计算机中取出代码并在另一台计算机上运行。代码配置、版本和依赖关系问题常常使这一工作具有挑战性。其次,即使代码运行,通常也很难理解或与分析交互。这使得评估代码及其发现变得困难,例如,在同行评审过程中。在这篇综述中,我们描述了两种计算技术的结合,这两种技术有助于使分析具有可共享性、交互性和完全可重复性。这些技术是:(1)分析容器化,它利用虚拟化将分析、数据、代码和依赖关系完全封装成一种交互式和可共享的格式,以及(2)代码笔记本,一种用于与分析交互的文字编程格式。这两种技术的融合比单独使用任何一种技术都有显著的优势。这篇综述调查了这种组合如何增强代码、分析和思想的可访问性和可再现性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
A Review of Containerization for Interactive and Reproducible Analysis
In recent decades the analysis of data has become increasingly computational. Correspondingly, this has changed how scientific and statistical work is shared. For example, it is now commonplace for underlying analysis code and data to be proffered alongside journal publications and conference talks. Unfortunately, sharing code faces several challenges. First, it is often difficult to take code from one computer and run it on another. Code configuration, version, and dependency issues often make this challenging. Secondly, even if the code runs, it is often hard to understand or interact with the analysis. This makes it difficult to assess the code and its findings, for example, in a peer review process. In this review we describe the combination of two computing technologies that help make analyses shareable, interactive, and completely reproducible. These technologies are (1) analysis containerization, which leverages virtualization to fully encapsulate analysis, data, code and dependencies into an interactive and shareable format, and (2) code notebooks, a literate programming format for interacting with analyses. The fusion of these two technologies offers significant advantages over using either individually. This review surveys how the combination enhances the accessibility and reproducibility of code, analyses, and ideas.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信