Scanflow:用于集群的端到端基于代理的自主ML工作流管理器

Proceedings of the 22nd International Middleware Conference: Demos and Posters Pub Date : 2021-12-06 DOI:10.1145/3491086.3492468

Peini Liu, Gusseppe Bravo Rocca, Jordi Guitart, Ajay Dholakia, David Ellison, M. Hodak

{"title":"Scanflow:用于集群的端到端基于代理的自主ML工作流管理器","authors":"Peini Liu, Gusseppe Bravo Rocca, Jordi Guitart, Ajay Dholakia, David Ellison, M. Hodak","doi":"10.1145/3491086.3492468","DOIUrl":null,"url":null,"abstract":"Machine Learning (ML) is more than just training models, the whole life-cycle must be considered. Once deployed, a ML model needs to be constantly managed, supervised and debugged to guarantee its availability, validity and robustness in dynamic contexts. This demonstration presents an agent-based ML workflow manager so-called Scanflow1, which enables autonomic management and supervision of the end-to-end life-cycle of ML workflows on distributed clusters. The case study on a MNIST project2 shows that different teams can collaborate using Scanflow within a ML project at different phases, and the effectiveness of agents to maintain the model accuracy and throughput of the model serving while running in production.","PeriodicalId":246858,"journal":{"name":"Proceedings of the 22nd International Middleware Conference: Demos and Posters","volume":"72 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"Scanflow: an end-to-end agent-based autonomic ML workflow manager for clusters\",\"authors\":\"Peini Liu, Gusseppe Bravo Rocca, Jordi Guitart, Ajay Dholakia, David Ellison, M. Hodak\",\"doi\":\"10.1145/3491086.3492468\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Machine Learning (ML) is more than just training models, the whole life-cycle must be considered. Once deployed, a ML model needs to be constantly managed, supervised and debugged to guarantee its availability, validity and robustness in dynamic contexts. This demonstration presents an agent-based ML workflow manager so-called Scanflow1, which enables autonomic management and supervision of the end-to-end life-cycle of ML workflows on distributed clusters. The case study on a MNIST project2 shows that different teams can collaborate using Scanflow within a ML project at different phases, and the effectiveness of agents to maintain the model accuracy and throughput of the model serving while running in production.\",\"PeriodicalId\":246858,\"journal\":{\"name\":\"Proceedings of the 22nd International Middleware Conference: Demos and Posters\",\"volume\":\"72 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-12-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 22nd International Middleware Conference: Demos and Posters\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3491086.3492468\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 22nd International Middleware Conference: Demos and Posters","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3491086.3492468","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 5

摘要

机器学习(ML)不仅仅是训练模型，它必须考虑整个生命周期。一旦部署，机器学习模型需要不断地管理、监督和调试，以保证其在动态环境中的可用性、有效性和鲁棒性。本演示展示了一个基于代理的ML工作流管理器Scanflow1，它支持对分布式集群上ML工作流的端到端生命周期进行自主管理和监督。对MNIST项目的案例研究2表明，不同的团队可以在ML项目的不同阶段使用Scanflow进行协作，并且代理在生产中运行时保持模型准确性和模型服务的吞吐量的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Scanflow: an end-to-end agent-based autonomic ML workflow manager for clusters

Machine Learning (ML) is more than just training models, the whole life-cycle must be considered. Once deployed, a ML model needs to be constantly managed, supervised and debugged to guarantee its availability, validity and robustness in dynamic contexts. This demonstration presents an agent-based ML workflow manager so-called Scanflow1, which enables autonomic management and supervision of the end-to-end life-cycle of ML workflows on distributed clusters. The case study on a MNIST project2 shows that different teams can collaborate using Scanflow within a ML project at different phases, and the effectiveness of agents to maintain the model accuracy and throughput of the model serving while running in production.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 22nd International Middleware Conference: Demos and Posters

自引率

0.00%

发文量