FuncPipe: A Pipelined Serverless Framework for Fast and Cost-Efficient Training of Deep Learning Models

Proceedings of the ACM on Measurement and Analysis of Computing Systems Pub Date : 2022-04-28 DOI:10.1145/3570607

Yunzhuo Liu, Bo Jiang, Tian Guo, Zimeng Huang, Wen-ping Ma, Xinbing Wang, Chenghu Zhou

{"title":"FuncPipe: A Pipelined Serverless Framework for Fast and Cost-Efficient Training of Deep Learning Models","authors":"Yunzhuo Liu, Bo Jiang, Tian Guo, Zimeng Huang, Wen-ping Ma, Xinbing Wang, Chenghu Zhou","doi":"10.1145/3570607","DOIUrl":null,"url":null,"abstract":"Training deep learning (DL) models in the cloud has become a norm. With the emergence of serverless computing and its benefits of true pay-as-you-go pricing and scalability, systems researchers have recently started to provide support for serverless-based training. However, the ability to train DL models on serverless platforms is hindered by the resource limitations of today's serverless infrastructure and DL models' explosive requirement for memory and bandwidth. This paper describes FuncPipe, a novel pipelined training framework specifically designed for serverless platforms that enable fast and low-cost training of DL models. FuncPipe is designed with the key insight that model partitioning can be leveraged to bridge both memory and bandwidth gaps between the capacity of serverless functions and the requirement of DL training. Conceptually simple, we have to answer several design questions, including how to partition the model, configure each serverless function, and exploit each function's uplink/downlink bandwidth. In particular, we tailor a micro-batch scheduling policy for the serverless environment, which serves as the basis for the subsequent optimization. Our Mixed-Integer Quadratic Programming formulation automatically and simultaneously configures serverless resources and partitions models to fit within the resource constraints. Lastly, we improve the bandwidth efficiency of storage-based synchronization with a novel pipelined scatter-reduce algorithm. We implement FuncPipe on two popular cloud serverless platforms and show that it achieves 7%-77% cost savings and 1.3X-2.2X speedup compared to state-of-the-art serverless-based frameworks.","PeriodicalId":426760,"journal":{"name":"Proceedings of the ACM on Measurement and Analysis of Computing Systems","volume":"84 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the ACM on Measurement and Analysis of Computing Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3570607","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

Training deep learning (DL) models in the cloud has become a norm. With the emergence of serverless computing and its benefits of true pay-as-you-go pricing and scalability, systems researchers have recently started to provide support for serverless-based training. However, the ability to train DL models on serverless platforms is hindered by the resource limitations of today's serverless infrastructure and DL models' explosive requirement for memory and bandwidth. This paper describes FuncPipe, a novel pipelined training framework specifically designed for serverless platforms that enable fast and low-cost training of DL models. FuncPipe is designed with the key insight that model partitioning can be leveraged to bridge both memory and bandwidth gaps between the capacity of serverless functions and the requirement of DL training. Conceptually simple, we have to answer several design questions, including how to partition the model, configure each serverless function, and exploit each function's uplink/downlink bandwidth. In particular, we tailor a micro-batch scheduling policy for the serverless environment, which serves as the basis for the subsequent optimization. Our Mixed-Integer Quadratic Programming formulation automatically and simultaneously configures serverless resources and partitions models to fit within the resource constraints. Lastly, we improve the bandwidth efficiency of storage-based synchronization with a novel pipelined scatter-reduce algorithm. We implement FuncPipe on two popular cloud serverless platforms and show that it achieves 7%-77% cost savings and 1.3X-2.2X speedup compared to state-of-the-art serverless-based frameworks.

查看原文本刊更多论文

FuncPipe:一个流水线式无服务器框架，用于快速和经济高效的深度学习模型训练

在云端训练深度学习(DL)模型已经成为一种常态。随着无服务器计算的出现及其真正的按需付费定价和可扩展性的好处，系统研究人员最近开始为基于无服务器的培训提供支持。然而，在无服务器平台上训练深度学习模型的能力受到当今无服务器基础设施的资源限制和深度学习模型对内存和带宽的爆炸性需求的阻碍。本文描述了FuncPipe，这是一种专门为无服务器平台设计的新型流水线训练框架，可以实现快速、低成本的深度学习模型训练。FuncPipe的设计关键在于，可以利用模型划分来弥合无服务器功能容量与深度学习训练需求之间的内存和带宽差距。概念上很简单，我们必须回答几个设计问题，包括如何划分模型，配置每个无服务器功能，以及利用每个功能的上行/下行链路带宽。特别地，我们为无服务器环境定制了微批调度策略，作为后续优化的基础。我们的混合整数二次规划公式自动并同时配置无服务器资源和分区模型，以适应资源约束。最后，我们提出了一种新的流水线散点减少算法，提高了基于存储的同步的带宽效率。我们在两个流行的无服务器云平台上实现了FuncPipe，结果表明，与最先进的基于无服务器的框架相比，它节省了7%-77%的成本，加速了1.3 -2.2倍。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the ACM on Measurement and Analysis of Computing Systems

CiteScore

3.20

自引率

0.00%

发文量