ProML:机器学习软件系统来源管理的分散平台

European Conference on Software Architecture Pub Date : 2022-06-21 DOI:10.48550/arXiv.2206.10110

N. Tran, Bushra Sabir, M. A. Babar, Nini Cui, M. Abolhasan, J. Lipman

{"title":"ProML:机器学习软件系统来源管理的分散平台","authors":"N. Tran, Bushra Sabir, M. A. Babar, Nini Cui, M. Abolhasan, J. Lipman","doi":"10.48550/arXiv.2206.10110","DOIUrl":null,"url":null,"abstract":"Large-scale Machine Learning (ML) based Software Systems are increasingly developed by distributed teams situated in different trust domains. Insider threats can launch attacks from any domain to compromise ML assets (models and datasets). Therefore, practitioners require information about how and by whom ML assets were developed to assess their quality attributes such as security, safety, and fairness. Unfortunately, it is challenging for ML teams to access and reconstruct such historical information of ML assets (ML provenance) because it is generally fragmented across distributed ML teams and threatened by the same adversaries that attack ML assets. This paper proposes ProML, a decentralised platform that leverages blockchain and smart contracts to empower distributed ML teams to jointly manage a single source of truth about circulated ML assets' provenance without relying on a third party, which is vulnerable to insider threats and presents a single point of failure. We propose a novel architectural approach called Artefact-as-a-State-Machine to leverage blockchain transactions and smart contracts for managing ML provenance information and introduce a user-driven provenance capturing mechanism to integrate existing scripts and tools to ProML without compromising participants' control over their assets and toolchains. We evaluate the performance and overheads of ProML by benchmarking a proof-of-concept system on a global blockchain. Furthermore, we assessed ProML's security against a threat model of a distributed ML workflow.","PeriodicalId":386831,"journal":{"name":"European Conference on Software Architecture","volume":"74 3","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"ProML: A Decentralised Platform for Provenance Management of Machine Learning Software Systems\",\"authors\":\"N. Tran, Bushra Sabir, M. A. Babar, Nini Cui, M. Abolhasan, J. Lipman\",\"doi\":\"10.48550/arXiv.2206.10110\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Large-scale Machine Learning (ML) based Software Systems are increasingly developed by distributed teams situated in different trust domains. Insider threats can launch attacks from any domain to compromise ML assets (models and datasets). Therefore, practitioners require information about how and by whom ML assets were developed to assess their quality attributes such as security, safety, and fairness. Unfortunately, it is challenging for ML teams to access and reconstruct such historical information of ML assets (ML provenance) because it is generally fragmented across distributed ML teams and threatened by the same adversaries that attack ML assets. This paper proposes ProML, a decentralised platform that leverages blockchain and smart contracts to empower distributed ML teams to jointly manage a single source of truth about circulated ML assets' provenance without relying on a third party, which is vulnerable to insider threats and presents a single point of failure. We propose a novel architectural approach called Artefact-as-a-State-Machine to leverage blockchain transactions and smart contracts for managing ML provenance information and introduce a user-driven provenance capturing mechanism to integrate existing scripts and tools to ProML without compromising participants' control over their assets and toolchains. We evaluate the performance and overheads of ProML by benchmarking a proof-of-concept system on a global blockchain. Furthermore, we assessed ProML's security against a threat model of a distributed ML workflow.\",\"PeriodicalId\":386831,\"journal\":{\"name\":\"European Conference on Software Architecture\",\"volume\":\"74 3\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-06-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"European Conference on Software Architecture\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.48550/arXiv.2206.10110\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"European Conference on Software Architecture","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2206.10110","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

基于大规模机器学习(ML)的软件系统越来越多地由位于不同信任域的分布式团队开发。内部威胁可以从任何领域发起攻击，破坏机器学习资产(模型和数据集)。因此，从业者需要关于机器学习资产是如何以及由谁开发的信息，以评估它们的质量属性，如安全性、安全性和公平性。不幸的是，对于ML团队来说，访问和重建ML资产(ML来源)的历史信息是具有挑战性的，因为它通常分散在分布式ML团队中，并且受到攻击ML资产的相同对手的威胁。本文提出了ProML，这是一个分散的平台，它利用区块链和智能合约来授权分布式ML团队共同管理关于流通ML资产来源的单一真相来源，而不依赖于第三方，这容易受到内部威胁并出现单点故障。我们提出了一种新的架构方法，称为“工件即状态机”，利用区块链交易和智能合约来管理ML来源信息，并引入用户驱动的来源捕获机制，将现有脚本和工具集成到ProML中，而不会影响参与者对其资产和工具链的控制。我们通过在全球区块链上对概念验证系统进行基准测试来评估ProML的性能和开销。此外，我们针对分布式ML工作流的威胁模型评估了ProML的安全性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

ProML: A Decentralised Platform for Provenance Management of Machine Learning Software Systems

Large-scale Machine Learning (ML) based Software Systems are increasingly developed by distributed teams situated in different trust domains. Insider threats can launch attacks from any domain to compromise ML assets (models and datasets). Therefore, practitioners require information about how and by whom ML assets were developed to assess their quality attributes such as security, safety, and fairness. Unfortunately, it is challenging for ML teams to access and reconstruct such historical information of ML assets (ML provenance) because it is generally fragmented across distributed ML teams and threatened by the same adversaries that attack ML assets. This paper proposes ProML, a decentralised platform that leverages blockchain and smart contracts to empower distributed ML teams to jointly manage a single source of truth about circulated ML assets' provenance without relying on a third party, which is vulnerable to insider threats and presents a single point of failure. We propose a novel architectural approach called Artefact-as-a-State-Machine to leverage blockchain transactions and smart contracts for managing ML provenance information and introduce a user-driven provenance capturing mechanism to integrate existing scripts and tools to ProML without compromising participants' control over their assets and toolchains. We evaluate the performance and overheads of ProML by benchmarking a proof-of-concept system on a global blockchain. Furthermore, we assessed ProML's security against a threat model of a distributed ML workflow.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

European Conference on Software Architecture

自引率

0.00%

发文量