Dirigent: Lightweight Serverless Orchestration

Lazar Cvetković, François Costa, Mihajlo Djokic, Michal Friedman, Ana Klimovic
{"title":"Dirigent: Lightweight Serverless Orchestration","authors":"Lazar Cvetković, François Costa, Mihajlo Djokic, Michal Friedman, Ana Klimovic","doi":"arxiv-2404.16393","DOIUrl":null,"url":null,"abstract":"While Function as a Service (FaaS) platforms can initialize function\nsandboxes on worker nodes in 10-100s of milliseconds, the latency to schedule\nfunctions in real FaaS clusters can be orders of magnitude higher. We find that\nthe current approach of building FaaS cluster managers on top of legacy\norchestration systems like Kubernetes leads to high scheduling delay at high\nsandbox churn, which is typical in FaaS clusters. While generic cluster\nmanagers use hierarchical abstractions and multiple internal components to\nmanage and reconcile state with frequent persistent updates, this becomes a\nbottleneck for FaaS, where cluster state frequently changes as sandboxes are\ncreated on the critical path of requests. Based on our root cause analysis of\nperformance issues in existing FaaS cluster managers, we propose Dirigent, a\nclean-slate system architecture for FaaS orchestration with three key\nprinciples. First, Dirigent optimizes internal cluster manager abstractions to\nsimplify state management. Second, it eliminates persistent state updates on\nthe critical path of function invocations, leveraging the fact that FaaS\nabstracts sandboxes from users to relax exact state reconstruction guarantees.\nFinally, Dirigent runs monolithic control and data planes to minimize internal\ncommunication overheads and maximize throughput. We compare Dirigent to\nstate-of-the-art FaaS platforms and show that Dirigent reduces 99th percentile\nper-function scheduling latency for a production workload by 2.79x compared to\nAWS Lambda and can spin up 2500 sandboxes per second at low latency, which is\n1250x more than with Knative.","PeriodicalId":501333,"journal":{"name":"arXiv - CS - Operating Systems","volume":"244 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Operating Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2404.16393","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

While Function as a Service (FaaS) platforms can initialize function sandboxes on worker nodes in 10-100s of milliseconds, the latency to schedule functions in real FaaS clusters can be orders of magnitude higher. We find that the current approach of building FaaS cluster managers on top of legacy orchestration systems like Kubernetes leads to high scheduling delay at high sandbox churn, which is typical in FaaS clusters. While generic cluster managers use hierarchical abstractions and multiple internal components to manage and reconcile state with frequent persistent updates, this becomes a bottleneck for FaaS, where cluster state frequently changes as sandboxes are created on the critical path of requests. Based on our root cause analysis of performance issues in existing FaaS cluster managers, we propose Dirigent, a clean-slate system architecture for FaaS orchestration with three key principles. First, Dirigent optimizes internal cluster manager abstractions to simplify state management. Second, it eliminates persistent state updates on the critical path of function invocations, leveraging the fact that FaaS abstracts sandboxes from users to relax exact state reconstruction guarantees. Finally, Dirigent runs monolithic control and data planes to minimize internal communication overheads and maximize throughput. We compare Dirigent to state-of-the-art FaaS platforms and show that Dirigent reduces 99th percentile per-function scheduling latency for a production workload by 2.79x compared to AWS Lambda and can spin up 2500 sandboxes per second at low latency, which is 1250x more than with Knative.
指挥:轻量级无服务器协调
虽然功能即服务(FaaS)平台可以在 10-100 毫秒内初始化工作者节点上的功能和盒子,但在实际的 FaaS 集群中调度功能的延迟可能要高出几个数量级。我们发现,目前在 Kubernetes 等传统编排系统基础上构建 FaaS 集群管理器的方法会导致在 FaaS 集群中典型的高容器流失时出现高调度延迟。虽然通用集群管理器使用分层抽象和多个内部组件来管理和调节频繁持续更新的状态,但这成为 FaaS 的瓶颈,因为在请求的关键路径上创建沙箱时,集群状态会频繁变化。基于对现有 FaaS 集群管理器性能问题的根本原因分析,我们提出了 Dirigent,这是一种用于 FaaS 协调的无缝系统架构,具有三个关键原则。首先,Dirigent 优化了内部集群管理器抽象,简化了状态管理。其次,它消除了函数调用关键路径上的持续状态更新,利用 FaaS 将沙箱抽象为用户的事实,放宽了对精确状态重建的保证。最后,Dirigent 运行单片式控制平面和数据平面,以最大限度地减少内部通信开销,最大限度地提高吞吐量。我们将 Dirigent 与目前最先进的 FaaS 平台进行了比较,结果表明,与 AWS Lambda 相比,Dirigent 可将生产工作负载的第 99 百分位数函数调度延迟降低 2.79 倍,并能以低延迟每秒启动 2500 个沙盒,是 Knative 的 1250 倍。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信