Integrated Reproducibility with Self-describing Machine Learning Models

J. Wonsil, J. Sullivan, Margo Seltzer, A. Pocock
{"title":"Integrated Reproducibility with Self-describing Machine Learning Models","authors":"J. Wonsil, J. Sullivan, Margo Seltzer, A. Pocock","doi":"10.1145/3589806.3600039","DOIUrl":null,"url":null,"abstract":"Researchers and data scientists frequently want to collaborate on machine learning models. However, in the presence of sharing and simultaneous experimentation, it is challenging both to determine if two models were trained identically and to reproduce precisely someone else’s training process. We demonstrate how provenance collection that is tightly integrated into a machine learning library facilitates reproducibility. We present MERIT, a reproducibility system that leverages a robust configuration system and extensive provenance collection to exactly reproduce models, given only a model object. We integrate MERIT with Tribuo, an open-source Java-based machine learning library. Key features of this integrated reproducibility framework include controlling for sources of non-determinism in a multi-threaded environment and exposing the training differences between two models in a human-readable form. Our system allows simple reproduction of deployed Tribuo models without any additional information, ensuring data science research is reproducible. Our framework is open-source and available under an Apache 2.0 license.","PeriodicalId":393751,"journal":{"name":"Proceedings of the 2023 ACM Conference on Reproducibility and Replicability","volume":"149 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2023 ACM Conference on Reproducibility and Replicability","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3589806.3600039","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Researchers and data scientists frequently want to collaborate on machine learning models. However, in the presence of sharing and simultaneous experimentation, it is challenging both to determine if two models were trained identically and to reproduce precisely someone else’s training process. We demonstrate how provenance collection that is tightly integrated into a machine learning library facilitates reproducibility. We present MERIT, a reproducibility system that leverages a robust configuration system and extensive provenance collection to exactly reproduce models, given only a model object. We integrate MERIT with Tribuo, an open-source Java-based machine learning library. Key features of this integrated reproducibility framework include controlling for sources of non-determinism in a multi-threaded environment and exposing the training differences between two models in a human-readable form. Our system allows simple reproduction of deployed Tribuo models without any additional information, ensuring data science research is reproducible. Our framework is open-source and available under an Apache 2.0 license.
集成再现性与自描述机器学习模型
研究人员和数据科学家经常希望在机器学习模型上进行合作。然而,在共享和同时实验的情况下,确定两个模型是否训练相同以及精确地复制其他人的训练过程是具有挑战性的。我们演示了紧密集成到机器学习库中的来源集合如何促进再现性。我们提出MERIT,一个可再现系统,它利用一个健壮的配置系统和广泛的来源收集来精确地再现模型,只给出一个模型对象。我们将MERIT与tribo(一个基于java的开源机器学习库)集成在一起。这个集成的再现性框架的关键特性包括在多线程环境中控制不确定性的来源,并以人类可读的形式暴露两个模型之间的训练差异。我们的系统允许在没有任何额外信息的情况下简单地复制部署的Tribuo模型,确保数据科学研究的可重复性。我们的框架是开源的,在Apache 2.0许可下可用。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信