Scanflow: an end-to-end agent-based autonomic ML workflow manager for clusters

Proceedings of the 22nd International Middleware Conference: Demos and Posters Pub Date : 2021-12-06 DOI:10.1145/3491086.3492468

Peini Liu, Gusseppe Bravo Rocca, Jordi Guitart, Ajay Dholakia, David Ellison, M. Hodak

引用次数: 5

Abstract

Machine Learning (ML) is more than just training models, the whole life-cycle must be considered. Once deployed, a ML model needs to be constantly managed, supervised and debugged to guarantee its availability, validity and robustness in dynamic contexts. This demonstration presents an agent-based ML workflow manager so-called Scanflow1, which enables autonomic management and supervision of the end-to-end life-cycle of ML workflows on distributed clusters. The case study on a MNIST project2 shows that different teams can collaborate using Scanflow within a ML project at different phases, and the effectiveness of agents to maintain the model accuracy and throughput of the model serving while running in production.

查看原文本刊更多论文

Scanflow:用于集群的端到端基于代理的自主ML工作流管理器

机器学习(ML)不仅仅是训练模型，它必须考虑整个生命周期。一旦部署，机器学习模型需要不断地管理、监督和调试，以保证其在动态环境中的可用性、有效性和鲁棒性。本演示展示了一个基于代理的ML工作流管理器Scanflow1，它支持对分布式集群上ML工作流的端到端生命周期进行自主管理和监督。对MNIST项目的案例研究2表明，不同的团队可以在ML项目的不同阶段使用Scanflow进行协作，并且代理在生产中运行时保持模型准确性和模型服务的吞吐量的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 22nd International Middleware Conference: Demos and Posters

自引率

0.00%

发文量