具有联合批处理和多处理的低成本无服务器推理服务

Proceedings of the 14th ACM SIGOPS Asia-Pacific Workshop on Systems Pub Date : 2023-08-24 DOI:10.1145/3609510.3609816

Shen Cai, Zhi Zhou, Kongyange Zhao, Xu Chen

{"title":"具有联合批处理和多处理的低成本无服务器推理服务","authors":"Shen Cai, Zhi Zhou, Kongyange Zhao, Xu Chen","doi":"10.1145/3609510.3609816","DOIUrl":null,"url":null,"abstract":"With the emerging of machine learning, many commercial companies increasingly utilize machine learning inference systems as backend services to improve their products. Serverless computing is a modern paradigm that provides auto-scaling, event-driven services, making it particularly well-suited for various domains, including video stream analysis, IoT serving and machine learning applications. The flexible scaling feature of serverless computing is adept at handling the burstiness of ML workloads. However, despite its compatibility with ML inference tasks, the cost of serverless inference systems remain relatively high in comparison to traditional serving paradigms, primarily due to the under-utilization of CPU resources offered by serverless platforms. To tackle this challenge, we design and deploy a serverless inference serving system that incorporates batching and multi-process mechanisms to enhance cost efficiency. By applying a change-point detection algorithm to manage bursty workloads, it optimizes resource usage and achieves lower costs. We employ an Amazon EC2 server for handling request packaging and running the core Bayesian Optimization algorithm without any prior information. The preliminary system, implemented on AWS Lambda, can significantly reduce expenses and save up to 62% compared to the original serverless inference system.","PeriodicalId":149629,"journal":{"name":"Proceedings of the 14th ACM SIGOPS Asia-Pacific Workshop on Systems","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Cost-Efficient Serverless Inference Serving with Joint Batching and Multi-Processing\",\"authors\":\"Shen Cai, Zhi Zhou, Kongyange Zhao, Xu Chen\",\"doi\":\"10.1145/3609510.3609816\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"With the emerging of machine learning, many commercial companies increasingly utilize machine learning inference systems as backend services to improve their products. Serverless computing is a modern paradigm that provides auto-scaling, event-driven services, making it particularly well-suited for various domains, including video stream analysis, IoT serving and machine learning applications. The flexible scaling feature of serverless computing is adept at handling the burstiness of ML workloads. However, despite its compatibility with ML inference tasks, the cost of serverless inference systems remain relatively high in comparison to traditional serving paradigms, primarily due to the under-utilization of CPU resources offered by serverless platforms. To tackle this challenge, we design and deploy a serverless inference serving system that incorporates batching and multi-process mechanisms to enhance cost efficiency. By applying a change-point detection algorithm to manage bursty workloads, it optimizes resource usage and achieves lower costs. We employ an Amazon EC2 server for handling request packaging and running the core Bayesian Optimization algorithm without any prior information. The preliminary system, implemented on AWS Lambda, can significantly reduce expenses and save up to 62% compared to the original serverless inference system.\",\"PeriodicalId\":149629,\"journal\":{\"name\":\"Proceedings of the 14th ACM SIGOPS Asia-Pacific Workshop on Systems\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-08-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 14th ACM SIGOPS Asia-Pacific Workshop on Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3609510.3609816\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 14th ACM SIGOPS Asia-Pacific Workshop on Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3609510.3609816","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

随着机器学习的兴起，许多商业公司越来越多地利用机器学习推理系统作为后端服务来改进他们的产品。无服务器计算是一种现代范例，提供自动扩展，事件驱动服务，使其特别适合各种领域，包括视频流分析，物联网服务和机器学习应用程序。无服务器计算的灵活扩展特性擅长处理ML工作负载的突发性。然而，尽管它与ML推理任务兼容，与传统的服务范式相比，无服务器推理系统的成本仍然相对较高，这主要是由于无服务器平台提供的CPU资源利用率不足。为了应对这一挑战，我们设计并部署了一个无服务器推理服务系统，该系统结合了批处理和多进程机制，以提高成本效率。通过应用变更点检测算法来管理突发工作负载，它可以优化资源使用并实现更低的成本。我们使用Amazon EC2服务器来处理请求打包和运行核心贝叶斯优化算法，而无需任何先验信息。与原始的无服务器推理系统相比，在AWS Lambda上实现的初步系统可以显着降低成本并节省高达62%的成本。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Cost-Efficient Serverless Inference Serving with Joint Batching and Multi-Processing

With the emerging of machine learning, many commercial companies increasingly utilize machine learning inference systems as backend services to improve their products. Serverless computing is a modern paradigm that provides auto-scaling, event-driven services, making it particularly well-suited for various domains, including video stream analysis, IoT serving and machine learning applications. The flexible scaling feature of serverless computing is adept at handling the burstiness of ML workloads. However, despite its compatibility with ML inference tasks, the cost of serverless inference systems remain relatively high in comparison to traditional serving paradigms, primarily due to the under-utilization of CPU resources offered by serverless platforms. To tackle this challenge, we design and deploy a serverless inference serving system that incorporates batching and multi-process mechanisms to enhance cost efficiency. By applying a change-point detection algorithm to manage bursty workloads, it optimizes resource usage and achieves lower costs. We employ an Amazon EC2 server for handling request packaging and running the core Bayesian Optimization algorithm without any prior information. The preliminary system, implemented on AWS Lambda, can significantly reduce expenses and save up to 62% compared to the original serverless inference system.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 14th ACM SIGOPS Asia-Pacific Workshop on Systems

自引率

0.00%

发文量