ChunkFunc: Dynamic SLO-Aware Configuration of Serverless Functions

IF 6 2区计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS

IEEE Transactions on Parallel and Distributed Systems Pub Date : 2025-04-09 DOI:10.1109/TPDS.2025.3559021

Thomas Pusztai;Stefan Nastic

{"title":"ChunkFunc: Dynamic SLO-Aware Configuration of Serverless Functions","authors":"Thomas Pusztai;Stefan Nastic","doi":"10.1109/TPDS.2025.3559021","DOIUrl":null,"url":null,"abstract":"Serverless computing promises to be a cost effective form of on demand computing. To fully utilize its cost saving potential, workflows must be configured with the appropriate amount of resources to meet their response time Service Level Objective (SLO), while keeping costs at a minimum. Since determining and updating these configuration models manually is a nontrivial and error prone task, researchers have developed solutions for automatically finding configurations that meet the aforementioned requirements. However, our initial experiments show that even when following best practices and using state-of-the-art configuration tools, resources may still be considerably over- or underprovisioned, depending on the size of functions’ input payload. In this paper we present ChunkFunc, an SLO- and input data-aware framework for tuning serverless workflows. Our main contributions include: i) an SLO- and input size-aware function performance model for optimized configurations in serverless workflows, ii) ChunkFunc Profiler, an auto-tuned, Bayesian Optimization-guided profiling mechanism for profiling serverless functions with typical input data sizes to build a performance model, and iii) ChunkFunc Workflow Optimizer, which uses these models to determine an input size dependent configuration for each serverless function in a workflow to meet the SLO, while keeping costs to a minimum. We evaluate ChunkFunc on real-life serverless workflows and compare it to two state-of-the-art solutions, showing that it increases SLO adherence by a factor of 1.04 to 2.78, depending on the workflow, and reduces costs by up to 61% .","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":"36 6","pages":"1237-1252"},"PeriodicalIF":6.0000,"publicationDate":"2025-04-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10959103","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Parallel and Distributed Systems","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10959103/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}

引用次数: 0

Abstract

Serverless computing promises to be a cost effective form of on demand computing. To fully utilize its cost saving potential, workflows must be configured with the appropriate amount of resources to meet their response time Service Level Objective (SLO), while keeping costs at a minimum. Since determining and updating these configuration models manually is a nontrivial and error prone task, researchers have developed solutions for automatically finding configurations that meet the aforementioned requirements. However, our initial experiments show that even when following best practices and using state-of-the-art configuration tools, resources may still be considerably over- or underprovisioned, depending on the size of functions’ input payload. In this paper we present ChunkFunc, an SLO- and input data-aware framework for tuning serverless workflows. Our main contributions include: i) an SLO- and input size-aware function performance model for optimized configurations in serverless workflows, ii) ChunkFunc Profiler, an auto-tuned, Bayesian Optimization-guided profiling mechanism for profiling serverless functions with typical input data sizes to build a performance model, and iii) ChunkFunc Workflow Optimizer, which uses these models to determine an input size dependent configuration for each serverless function in a workflow to meet the SLO, while keeping costs to a minimum. We evaluate ChunkFunc on real-life serverless workflows and compare it to two state-of-the-art solutions, showing that it increases SLO adherence by a factor of 1.04 to 2.78, depending on the workflow, and reduces costs by up to 61% .

查看原文本刊更多论文

ChunkFunc：无服务器功能的动态慢速感知配置

无服务器计算有望成为按需计算的一种经济有效的形式。为了充分利用其节省成本的潜力，工作流必须配置适当数量的资源，以满足其响应时间服务水平目标（Service Level Objective， SLO），同时将成本保持在最低水平。由于手动确定和更新这些配置模型是一项重要且容易出错的任务，因此研究人员开发了自动查找满足上述需求的配置的解决方案。然而，我们最初的实验表明，即使遵循最佳实践并使用最先进的配置工具，根据功能输入有效负载的大小，资源可能仍然相当过剩或不足。在本文中，我们提出了ChunkFunc，一个用于调优无服务器工作流的SLO和输入数据感知框架。我们的主要贡献包括：i)基于SLO和输入大小感知的功能性能模型，用于优化无服务器工作流中的配置；ii) ChunkFunc Profiler，一种自动调整的、贝叶斯优化引导的分析机制，用于分析具有典型输入数据大小的无服务器函数，以构建性能模型；iii) ChunkFunc Workflow Optimizer，它使用这些模型来确定工作流中每个无服务器函数的输入大小依赖配置，以满足SLO。同时把成本降到最低。我们在实际的无服务器工作流程中对ChunkFunc进行了评估，并将其与两种最先进的解决方案进行了比较，结果表明，根据工作流程的不同，ChunkFunc将SLO遵守程度提高了1.04至2.78倍，并将成本降低了61%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Parallel and Distributed Systems 工程技术-工程：电子与电气

CiteScore

11.00

自引率

9.40%

发文量

281

审稿时长

5.6 months

期刊介绍： IEEE Transactions on Parallel and Distributed Systems (TPDS) is published monthly. It publishes a range of papers, comments on previously published papers, and survey articles that deal with the parallel and distributed systems research areas of current importance to our readers. Particular areas of interest include, but are not limited to: a) Parallel and distributed algorithms, focusing on topics such as: models of computation; numerical, combinatorial, and data-intensive parallel algorithms, scalability of algorithms and data structures for parallel and distributed systems, communication and synchronization protocols, network algorithms, scheduling, and load balancing. b) Applications of parallel and distributed computing, including computational and data-enabled science and engineering, big data applications, parallel crowd sourcing, large-scale social network analysis, management of big data, cloud and grid computing, scientific and biomedical applications, mobile computing, and cyber-physical systems. c) Parallel and distributed architectures, including architectures for instruction-level and thread-level parallelism; design, analysis, implementation, fault resilience and performance measurements of multiple-processor systems; multicore processors, heterogeneous many-core systems; petascale and exascale systems designs; novel big data architectures; special purpose architectures, including graphics processors, signal processors, network processors, media accelerators, and other special purpose processors and accelerators; impact of technology on architecture; network and interconnect architectures; parallel I/O and storage systems; architecture of the memory hierarchy; power-efficient and green computing architectures; dependable architectures; and performance modeling and evaluation. d) Parallel and distributed software, including parallel and multicore programming languages and compilers, runtime systems, operating systems, Internet computing and web services, resource management including green computing, middleware for grids, clouds, and data centers, libraries, performance modeling and evaluation, parallel programming paradigms, and programming environments and tools.