Dirigent: Lightweight Serverless Orchestration

arXiv - CS - Operating Systems Pub Date : 2024-04-25 DOI:arxiv-2404.16393

Lazar Cvetković, François Costa, Mihajlo Djokic, Michal Friedman, Ana Klimovic

{"title":"Dirigent: Lightweight Serverless Orchestration","authors":"Lazar Cvetković, François Costa, Mihajlo Djokic, Michal Friedman, Ana Klimovic","doi":"arxiv-2404.16393","DOIUrl":null,"url":null,"abstract":"While Function as a Service (FaaS) platforms can initialize function\nsandboxes on worker nodes in 10-100s of milliseconds, the latency to schedule\nfunctions in real FaaS clusters can be orders of magnitude higher. We find that\nthe current approach of building FaaS cluster managers on top of legacy\norchestration systems like Kubernetes leads to high scheduling delay at high\nsandbox churn, which is typical in FaaS clusters. While generic cluster\nmanagers use hierarchical abstractions and multiple internal components to\nmanage and reconcile state with frequent persistent updates, this becomes a\nbottleneck for FaaS, where cluster state frequently changes as sandboxes are\ncreated on the critical path of requests. Based on our root cause analysis of\nperformance issues in existing FaaS cluster managers, we propose Dirigent, a\nclean-slate system architecture for FaaS orchestration with three key\nprinciples. First, Dirigent optimizes internal cluster manager abstractions to\nsimplify state management. Second, it eliminates persistent state updates on\nthe critical path of function invocations, leveraging the fact that FaaS\nabstracts sandboxes from users to relax exact state reconstruction guarantees.\nFinally, Dirigent runs monolithic control and data planes to minimize internal\ncommunication overheads and maximize throughput. We compare Dirigent to\nstate-of-the-art FaaS platforms and show that Dirigent reduces 99th percentile\nper-function scheduling latency for a production workload by 2.79x compared to\nAWS Lambda and can spin up 2500 sandboxes per second at low latency, which is\n1250x more than with Knative.","PeriodicalId":501333,"journal":{"name":"arXiv - CS - Operating Systems","volume":"244 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Operating Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2404.16393","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

While Function as a Service (FaaS) platforms can initialize function sandboxes on worker nodes in 10-100s of milliseconds, the latency to schedule functions in real FaaS clusters can be orders of magnitude higher. We find that the current approach of building FaaS cluster managers on top of legacy orchestration systems like Kubernetes leads to high scheduling delay at high sandbox churn, which is typical in FaaS clusters. While generic cluster managers use hierarchical abstractions and multiple internal components to manage and reconcile state with frequent persistent updates, this becomes a bottleneck for FaaS, where cluster state frequently changes as sandboxes are created on the critical path of requests. Based on our root cause analysis of performance issues in existing FaaS cluster managers, we propose Dirigent, a clean-slate system architecture for FaaS orchestration with three key principles. First, Dirigent optimizes internal cluster manager abstractions to simplify state management. Second, it eliminates persistent state updates on the critical path of function invocations, leveraging the fact that FaaS abstracts sandboxes from users to relax exact state reconstruction guarantees. Finally, Dirigent runs monolithic control and data planes to minimize internal communication overheads and maximize throughput. We compare Dirigent to state-of-the-art FaaS platforms and show that Dirigent reduces 99th percentile per-function scheduling latency for a production workload by 2.79x compared to AWS Lambda and can spin up 2500 sandboxes per second at low latency, which is 1250x more than with Knative.

查看原文本刊更多论文

指挥：轻量级无服务器协调

虽然功能即服务（FaaS）平台可以在 10-100 毫秒内初始化工作者节点上的功能和盒子，但在实际的 FaaS 集群中调度功能的延迟可能要高出几个数量级。我们发现，目前在 Kubernetes 等传统编排系统基础上构建 FaaS 集群管理器的方法会导致在 FaaS 集群中典型的高容器流失时出现高调度延迟。虽然通用集群管理器使用分层抽象和多个内部组件来管理和调节频繁持续更新的状态，但这成为 FaaS 的瓶颈，因为在请求的关键路径上创建沙箱时，集群状态会频繁变化。基于对现有 FaaS 集群管理器性能问题的根本原因分析，我们提出了 Dirigent，这是一种用于 FaaS 协调的无缝系统架构，具有三个关键原则。首先，Dirigent 优化了内部集群管理器抽象，简化了状态管理。其次，它消除了函数调用关键路径上的持续状态更新，利用 FaaS 将沙箱抽象为用户的事实，放宽了对精确状态重建的保证。最后，Dirigent 运行单片式控制平面和数据平面，以最大限度地减少内部通信开销，最大限度地提高吞吐量。我们将 Dirigent 与目前最先进的 FaaS 平台进行了比较，结果表明，与 AWS Lambda 相比，Dirigent 可将生产工作负载的第 99 百分位数函数调度延迟降低 2.79 倍，并能以低延迟每秒启动 2500 个沙盒，是 Knative 的 1250 倍。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

arXiv - CS - Operating Systems

自引率

0.00%

发文量