Proceedings of the 2023 ACM Conference on Reproducibility and Replicability最新文献

We Need More Reproducibility Content Across the Computer Science Curriculum 我们需要在整个计算机科学课程中增加可重复性内容

Proceedings of the 2023 ACM Conference on Reproducibility and Replicability Pub Date : 2023-06-27 DOI: 10.1145/3589806.3600033

Fraida Fund

引用次数: 0

On Reporting Robust and Trustworthy Conclusions from Model Comparison Studies Involving Neural Networks and Randomness 从涉及神经网络和随机性的模型比较研究中报告稳健和可信的结论

Proceedings of the 2023 ACM Conference on Reproducibility and Replicability Pub Date : 2023-06-27 DOI: 10.1145/3589806.3600044

Odd Erik Gundersen, Saeid Shamsaliei, H. S. Kjærnli, H. Langseth

引用次数: 1

Towards Reproducible Execution of Closed-Source Applications from Internet Archives 从互联网档案中实现闭源应用程序的可重复执行

Proceedings of the 2023 ACM Conference on Reproducibility and Replicability Pub Date : 2023-06-27 DOI: 10.1145/3589806.3600035

M. Satyanarayanan, J. Harkes, J. Blakley

引用次数: 0

Automatic Reproduction of Workflows in the Snakemake Workflow Catalog and nf-core Registries 在snakemaker工作流目录和非核心注册表中自动复制工作流

Proceedings of the 2023 ACM Conference on Reproducibility and Replicability Pub Date : 2023-06-27 DOI: 10.1145/3589806.3600037

Samuel Grayson, D. Marinov, Daniel S. Katz, Reed Milewicz

引用次数: 1

KheOps: Cost-effective Repeatability, Reproducibility, and Replicability of Edge-to-Cloud Experiments KheOps:具有成本效益的可重复性、再现性和边缘到云实验的可复制性

Proceedings of the 2023 ACM Conference on Reproducibility and Replicability Pub Date : 2023-06-27 DOI: 10.1145/3589806.3600032

Daniel Rosendo, K. Keahey, Alexandru Costan, Matthieu Simonin, P. Valduriez, Gabriel Antoniu

{"title":"KheOps: Cost-effective Repeatability, Reproducibility, and Replicability of Edge-to-Cloud Experiments","authors":"Daniel Rosendo, K. Keahey, Alexandru Costan, Matthieu Simonin, P. Valduriez, Gabriel Antoniu","doi":"10.1145/3589806.3600032","DOIUrl":"https://doi.org/10.1145/3589806.3600032","url":null,"abstract":"Distributed infrastructures for computation and analytics are now evolving towards an interconnected ecosystem allowing complex scientific workflows to be executed across hybrid systems spanning from IoT Edge devices to Clouds, and sometimes to supercomputers (the Computing Continuum). Understanding the performance trade-offs of large-scale workflows deployed on such complex Edge-to-Cloud Continuum is challenging. To achieve this, one needs to systematically perform experiments, to enable their reproducibility and allow other researchers to replicate the study and the obtained conclusions on different infrastructures. This breaks down to the tedious process of reconciling the numerous experimental requirements and constraints with low-level infrastructure design choices. To address the limitations of the main state-of-the-art approaches for distributed, collaborative experimentation, such as Google Colab, Kaggle, and Code Ocean, we propose KheOps, a collaborative environment specifically designed to enable cost-effective reproducibility and replicability of Edge-to-Cloud experiments. KheOps is composed of three core elements: (1) an experiment repository; (2) a notebook environment; and (3) a multi-platform experiment methodology. We illustrate KheOps with a real-life Edge-to-Cloud application. The evaluations explore the point of view of the authors of an experiment described in an article (who aim to make their experiments reproducible) and the perspective of their readers (who aim to replicate the experiment). The results show how KheOps helps authors to systematically perform repeatable and reproducible experiments on the Grid5000 + FIT IoT LAB testbeds. Furthermore, KheOps helps readers to cost-effectively replicate authors experiments in different infrastructures such as Chameleon Cloud + CHI@Edge testbeds, and obtain the same conclusions with high accuracies (> 88% for all performance metrics).","PeriodicalId":393751,"journal":{"name":"Proceedings of the 2023 ACM Conference on Reproducibility and Replicability","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122507641","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Towards Evidence-Based Software Quality Practices for Reproducibility: Preliminary Results and Research Directions 面向可重复性的循证软件质量实践:初步结果和研究方向

Proceedings of the 2023 ACM Conference on Reproducibility and Replicability Pub Date : 2023-06-27 DOI: 10.1145/3589806.3600040

Reed Milewicz, Miranda R. Mundt

引用次数: 0

GTP Benchmarks for Gradual Typing Performance 渐进式输入性能的GTP基准测试

Proceedings of the 2023 ACM Conference on Reproducibility and Replicability Pub Date : 2023-06-27 DOI: 10.1145/3589806.3600034

B. Greenman

引用次数: 1

A Siren Song of Open Source Reproducibility, Examples from Machine Learning 开源再现性的警笛之歌，来自机器学习的例子

Proceedings of the 2023 ACM Conference on Reproducibility and Replicability Pub Date : 2023-06-27 DOI: 10.1145/3589806.3600042

Edward Raff, Andrew L. Farris

引用次数: 2

Integrated Reproducibility with Self-describing Machine Learning Models 集成再现性与自描述机器学习模型

Proceedings of the 2023 ACM Conference on Reproducibility and Replicability Pub Date : 2023-06-27 DOI: 10.1145/3589806.3600039

J. Wonsil, J. Sullivan, Margo Seltzer, A. Pocock

{"title":"Integrated Reproducibility with Self-describing Machine Learning Models","authors":"J. Wonsil, J. Sullivan, Margo Seltzer, A. Pocock","doi":"10.1145/3589806.3600039","DOIUrl":"https://doi.org/10.1145/3589806.3600039","url":null,"abstract":"Researchers and data scientists frequently want to collaborate on machine learning models. However, in the presence of sharing and simultaneous experimentation, it is challenging both to determine if two models were trained identically and to reproduce precisely someone else’s training process. We demonstrate how provenance collection that is tightly integrated into a machine learning library facilitates reproducibility. We present MERIT, a reproducibility system that leverages a robust configuration system and extensive provenance collection to exactly reproduce models, given only a model object. We integrate MERIT with Tribuo, an open-source Java-based machine learning library. Key features of this integrated reproducibility framework include controlling for sources of non-determinism in a multi-threaded environment and exposing the training differences between two models in a human-readable form. Our system allows simple reproduction of deployed Tribuo models without any additional information, ensuring data science research is reproducible. Our framework is open-source and available under an Apache 2.0 license.","PeriodicalId":393751,"journal":{"name":"Proceedings of the 2023 ACM Conference on Reproducibility and Replicability","volume":"149 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127262325","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Fingerprinting and Building Large Reproducible Datasets 指纹识别和构建大型可重复数据集

Proceedings of the 2023 ACM Conference on Reproducibility and Replicability Pub Date : 2023-06-20 DOI: 10.1145/3589806.3600043

Romain Lefeuvre, Jessie Galasso, B. Combemale, H. Sahraoui, Stefano Zacchiroli

{"title":"Fingerprinting and Building Large Reproducible Datasets","authors":"Romain Lefeuvre, Jessie Galasso, B. Combemale, H. Sahraoui, Stefano Zacchiroli","doi":"10.1145/3589806.3600043","DOIUrl":"https://doi.org/10.1145/3589806.3600043","url":null,"abstract":"Obtaining a relevant dataset is central to conducting empirical studies in software engineering. However, in the context of mining software repositories, the lack of appropriate tooling for large scale mining tasks hinders the creation of new datasets. Moreover, limitations related to data sources that change over time (e.g., code bases) and the lack of documentation of extraction processes make it difficult to reproduce datasets over time. This threatens the quality and reproducibility of empirical studies. In this paper, we propose a tool-supported approach facilitating the creation of large tailored datasets while ensuring their reproducibility. We leveraged all the sources feeding the Software Heritage append-only archive which are accessible through a unified programming interface to outline a reproducible and generic extraction process. We propose a way to define a unique fingerprint to characterize a dataset which, when provided to the extraction process, ensures that the same dataset will be extracted. We demonstrate the feasibility of our approach by implementing a prototype. We show how it can help reduce the limitations researchers face when creating or reproducing datasets.","PeriodicalId":393751,"journal":{"name":"Proceedings of the 2023 ACM Conference on Reproducibility and Replicability","volume":"242 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131118406","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0