SMILE: A Cost-Effective System for Serving Massive Pretrained Language Models in The Cloud

Companion of the 2023 International Conference on Management of Data Pub Date : 2023-06-04 DOI:10.1145/3555041.3589720

Jue Wang, Ke Chen, L. Shou, Dawei Jiang, Gang Chen

{"title":"SMILE: A Cost-Effective System for Serving Massive Pretrained Language Models in The Cloud","authors":"Jue Wang, Ke Chen, L. Shou, Dawei Jiang, Gang Chen","doi":"10.1145/3555041.3589720","DOIUrl":null,"url":null,"abstract":"Deep learning models, particularly pre-trained language models (PLMs), have become increasingly important for a variety of applications that require text/language processing. However, these models are resource-intensive and often require costly hardware such as dedicated GPU servers. In response to this issue, we present SMILE, a novel prototype system for efficient deployment and management of such models in the cloud. Our goal is to build a cloud platform from which tenants can easily derive their own custom models, and rent PLM processors to run inference services on these models at reduced costs. To facilitate this, we present a co-design of cost-effective storage and computation scheme for managing massive customized PLMs with constrained hardware resources via effective resource sharing and multiplexing. Our system consists of four core components: vPLM creator, vPLM storage appliance, vPLM trainer, and vPLM processor, which allow tenants to easily create, store, train, and use their customized PLM in the cloud without the need for dedicated hardware or maintenance. In particular, vPLM processors are virtualized from a physical machine, and are designed to have a multi-tenant nature, enabling efficient utilization of resources by precomputing the intermediate representation of PLMs and using adapters to provide customization instead of training the entire model. This allows tenants to host their PLMs in the cloud at minor costs. In our demonstration, we show that over 10,000 models can be hosted on one single machine without compromising the inference speed and accuracy. Overall, our system provides a convenient and cost-effective solution for tenants to host and manage PLMs in the cloud for their customized tasks.","PeriodicalId":161812,"journal":{"name":"Companion of the 2023 International Conference on Management of Data","volume":"316 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Companion of the 2023 International Conference on Management of Data","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3555041.3589720","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Deep learning models, particularly pre-trained language models (PLMs), have become increasingly important for a variety of applications that require text/language processing. However, these models are resource-intensive and often require costly hardware such as dedicated GPU servers. In response to this issue, we present SMILE, a novel prototype system for efficient deployment and management of such models in the cloud. Our goal is to build a cloud platform from which tenants can easily derive their own custom models, and rent PLM processors to run inference services on these models at reduced costs. To facilitate this, we present a co-design of cost-effective storage and computation scheme for managing massive customized PLMs with constrained hardware resources via effective resource sharing and multiplexing. Our system consists of four core components: vPLM creator, vPLM storage appliance, vPLM trainer, and vPLM processor, which allow tenants to easily create, store, train, and use their customized PLM in the cloud without the need for dedicated hardware or maintenance. In particular, vPLM processors are virtualized from a physical machine, and are designed to have a multi-tenant nature, enabling efficient utilization of resources by precomputing the intermediate representation of PLMs and using adapters to provide customization instead of training the entire model. This allows tenants to host their PLMs in the cloud at minor costs. In our demonstration, we show that over 10,000 models can be hosted on one single machine without compromising the inference speed and accuracy. Overall, our system provides a convenient and cost-effective solution for tenants to host and manage PLMs in the cloud for their customized tasks.

查看原文本刊更多论文

SMILE:一种在云端服务大量预训练语言模型的经济高效的系统

深度学习模型，特别是预训练语言模型(plm)，对于需要文本/语言处理的各种应用程序变得越来越重要。然而，这些模型是资源密集型的，通常需要昂贵的硬件，如专用GPU服务器。针对这个问题，我们提出了SMILE，一个新的原型系统，用于在云中有效地部署和管理这些模型。我们的目标是构建一个云平台，租户可以从中轻松地获得自己的自定义模型，并租用PLM处理器以较低的成本在这些模型上运行推理服务。为了实现这一点，我们提出了一种协同设计的具有成本效益的存储和计算方案，通过有效的资源共享和多路复用来管理具有受限硬件资源的大规模定制plm。我们的系统由四个核心组件组成:vPLM创建器、vPLM存储设备、vPLM培训器和vPLM处理器，使租户可以轻松地在云中创建、存储、培训和使用他们的定制PLM，而无需专用硬件或维护。特别是，vPLM处理器是从物理机器虚拟化的，并且设计成具有多租户特性，通过预先计算plm的中间表示和使用适配器来提供定制而不是训练整个模型，从而有效地利用资源。这允许租户以较小的成本在云中托管他们的plm。在我们的演示中，我们展示了在不影响推理速度和准确性的情况下，可以在一台机器上托管超过10,000个模型。总体而言，我们的系统为租户在云中托管和管理plm以完成定制任务提供了方便且经济高效的解决方案。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Companion of the 2023 International Conference on Management of Data

自引率

0.00%

发文量