Walle: An End-to-End, General-Purpose, and Large-Scale Production System for Device-Cloud Collaborative Machine Learning

Chengfei Lv, Chaoyue Niu, Renjie Gu, Xiaotang Jiang, Zhaode Wang, B. Liu, Ziqi Wu, Qiulin Yao, Congyu Huang, Panos Huang, Tao Huang, Hui Shu, Jinde Song, Bingzhang Zou, Peng Lan, Guohuan Xu, Fei Wu, Shaojie Tang, Fan Wu, Guihai Chen
{"title":"Walle: An End-to-End, General-Purpose, and Large-Scale Production System for Device-Cloud Collaborative Machine Learning","authors":"Chengfei Lv, Chaoyue Niu, Renjie Gu, Xiaotang Jiang, Zhaode Wang, B. Liu, Ziqi Wu, Qiulin Yao, Congyu Huang, Panos Huang, Tao Huang, Hui Shu, Jinde Song, Bingzhang Zou, Peng Lan, Guohuan Xu, Fei Wu, Shaojie Tang, Fan Wu, Guihai Chen","doi":"10.48550/arXiv.2205.14833","DOIUrl":null,"url":null,"abstract":"To break the bottlenecks of mainstream cloud-based machine learning (ML) paradigm, we adopt device-cloud collaborative ML and build the first end-to-end and general-purpose system, called Walle, as the foundation. Walle consists of a deployment platform, distributing ML tasks to billion-scale devices in time; a data pipeline, efficiently preparing task input; and a compute container, providing a cross-platform and high-performance execution environment, while facilitating daily task iteration. Specifically, the compute container is based on Mobile Neural Network (MNN), a tensor compute engine along with the data processing and model execution libraries, which are exposed through a refined Python thread-level virtual machine (VM) to support diverse ML tasks and concurrent task execution. The core of MNN is the novel mechanisms of operator decomposition and semi-auto search, sharply reducing the workload in manually optimizing hundreds of operators for tens of hardware backends and further quickly identifying the best backend with runtime optimization for a computation graph. The data pipeline introduces an on-device stream processing framework to enable processing user behavior data at source. The deployment platform releases ML tasks with an efficient push-then-pull method and supports multi-granularity deployment policies. We evaluate Walle in practical e-commerce application scenarios to demonstrate its effectiveness, efficiency, and scalability. Extensive micro-benchmarks also highlight the superior performance of MNN and the Python thread-level VM. Walle has been in large-scale production use in Alibaba, while MNN has been open source with a broad impact in the community.","PeriodicalId":90294,"journal":{"name":"Proceedings of the -- USENIX Symposium on Operating Systems Design and Implementation (OSDI). USENIX Symposium on Operating Systems Design and Implementation","volume":"38 1","pages":"249-265"},"PeriodicalIF":0.0000,"publicationDate":"2022-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"17","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the -- USENIX Symposium on Operating Systems Design and Implementation (OSDI). USENIX Symposium on Operating Systems Design and Implementation","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2205.14833","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 17

Abstract

To break the bottlenecks of mainstream cloud-based machine learning (ML) paradigm, we adopt device-cloud collaborative ML and build the first end-to-end and general-purpose system, called Walle, as the foundation. Walle consists of a deployment platform, distributing ML tasks to billion-scale devices in time; a data pipeline, efficiently preparing task input; and a compute container, providing a cross-platform and high-performance execution environment, while facilitating daily task iteration. Specifically, the compute container is based on Mobile Neural Network (MNN), a tensor compute engine along with the data processing and model execution libraries, which are exposed through a refined Python thread-level virtual machine (VM) to support diverse ML tasks and concurrent task execution. The core of MNN is the novel mechanisms of operator decomposition and semi-auto search, sharply reducing the workload in manually optimizing hundreds of operators for tens of hardware backends and further quickly identifying the best backend with runtime optimization for a computation graph. The data pipeline introduces an on-device stream processing framework to enable processing user behavior data at source. The deployment platform releases ML tasks with an efficient push-then-pull method and supports multi-granularity deployment policies. We evaluate Walle in practical e-commerce application scenarios to demonstrate its effectiveness, efficiency, and scalability. Extensive micro-benchmarks also highlight the superior performance of MNN and the Python thread-level VM. Walle has been in large-scale production use in Alibaba, while MNN has been open source with a broad impact in the community.
Walle:一个端到端、通用、大规模的设备云协同机器学习生产系统
为了打破主流基于云的机器学习(ML)范式的瓶颈,我们采用了设备-云协同ML,并构建了第一个端到端通用系统Walle作为基础。Walle包括一个部署平台,将机器学习任务及时分发到数十亿规模的设备上;数据管道,高效准备任务输入;以及计算容器,提供跨平台和高性能的执行环境,同时促进日常任务迭代。具体来说,计算容器基于移动神经网络(MNN),一个张量计算引擎,以及数据处理和模型执行库,通过一个改进的Python线程级虚拟机(VM)公开,以支持各种ML任务和并发任务执行。MNN的核心是新的算子分解和半自动搜索机制,大大减少了为数十个硬件后端手动优化数百个算子的工作量,并进一步通过运行时优化来快速识别计算图的最佳后端。数据管道引入设备上流处理框架,实现对用户行为数据的源处理。部署平台通过高效的推拉方式发布ML任务,并支持多粒度部署策略。我们在实际的电子商务应用场景中评估Walle,以证明其有效性、效率和可扩展性。大量的微基准测试还突出了MNN和Python线程级VM的优越性能。Walle已经在阿里巴巴中大规模生产使用,而MNN已经开源,在社区中产生了广泛的影响。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信