DGSF: Disaggregated GPUs for Serverless Functions

2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS) Pub Date : 2022-05-01 DOI:10.1109/ipdps53621.2022.00077

Henrique Fingler, Zhiting Zhu, Esther Yoon, Zhipeng Jia, E. Witchel, C. Rossbach

{"title":"DGSF: Disaggregated GPUs for Serverless Functions","authors":"Henrique Fingler, Zhiting Zhu, Esther Yoon, Zhipeng Jia, E. Witchel, C. Rossbach","doi":"10.1109/ipdps53621.2022.00077","DOIUrl":null,"url":null,"abstract":"Ease of use and transparent access to elastic resources have attracted many applications away from traditional platforms toward serverless functions. Many of these applications, such as machine learning, could benefit significantly from GPU acceleration. Unfortunately, GPUs remain inaccessible from serverless functions in modern production settings. We present DGSF, a platform that transparently enables serverless functions to use GPUs through general purpose APIs such as CUDA. DGSF solves provisioning and utilization challenges with disaggregation, serving the needs of a potentially large number of functions through virtual GPUs backed by a small pool of physical GPUs on dedicated servers. Disaggregation allows the provider to decouple GPU provisioning from other resources, and enables significant benefits through consolidation. We describe how DGSF solves GPU disaggregation challenges including supporting API transparency, hiding the latency of communication with remote GPUs, and load-balancing access to heavily shared GPUs. Evaluation of our prototype on six workloads shows that DGSF's API remoting optimizations can improve the runtime of a function by up to 50% relative to unoptimized DGSF. Such optimizations, which aggressively remove GPU runtime and object management latency from the critical path, can enable functions running over DGSF to have a lower end-to-end time than when running on a GPU natively. By enabling GPU sharing, DGSF can reduce function queueing latency by up to 53%. We use DGSF to augment AWS Lambda with GPU support, showing similar benefits.","PeriodicalId":321801,"journal":{"name":"2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ipdps53621.2022.00077","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 10

Abstract

Ease of use and transparent access to elastic resources have attracted many applications away from traditional platforms toward serverless functions. Many of these applications, such as machine learning, could benefit significantly from GPU acceleration. Unfortunately, GPUs remain inaccessible from serverless functions in modern production settings. We present DGSF, a platform that transparently enables serverless functions to use GPUs through general purpose APIs such as CUDA. DGSF solves provisioning and utilization challenges with disaggregation, serving the needs of a potentially large number of functions through virtual GPUs backed by a small pool of physical GPUs on dedicated servers. Disaggregation allows the provider to decouple GPU provisioning from other resources, and enables significant benefits through consolidation. We describe how DGSF solves GPU disaggregation challenges including supporting API transparency, hiding the latency of communication with remote GPUs, and load-balancing access to heavily shared GPUs. Evaluation of our prototype on six workloads shows that DGSF's API remoting optimizations can improve the runtime of a function by up to 50% relative to unoptimized DGSF. Such optimizations, which aggressively remove GPU runtime and object management latency from the critical path, can enable functions running over DGSF to have a lower end-to-end time than when running on a GPU natively. By enabling GPU sharing, DGSF can reduce function queueing latency by up to 53%. We use DGSF to augment AWS Lambda with GPU support, showing similar benefits.

查看原文本刊更多论文

DGSF:用于无服务器功能的分解gpu

易用性和对弹性资源的透明访问吸引了许多应用程序从传统平台转向无服务器功能。许多这样的应用，比如机器学习，都可以从GPU加速中获益。不幸的是，在现代生产环境中，gpu仍然无法从无服务器功能中访问。我们提出DGSF，一个透明的平台，使无服务器功能通过通用api(如CUDA)使用gpu。DGSF通过分解解决了供应和利用方面的挑战，通过专用服务器上的一小块物理gpu池支持的虚拟gpu来满足潜在的大量功能需求。分解允许提供商将GPU配置与其他资源解耦，并通过整合实现显著的好处。我们描述了DGSF如何解决GPU分解挑战，包括支持API透明度，隐藏与远程GPU通信的延迟，以及对大量共享GPU的负载平衡访问。对我们的原型在六个工作负载上的评估表明，相对于未优化的DGSF, DGSF的API远程优化可以将函数的运行时提高多达50%。这种优化从关键路径上积极地消除了GPU运行时和对象管理延迟，可以使在DGSF上运行的功能具有比在本地GPU上运行时更低的端到端时间。通过启用GPU共享，DGSF可以减少多达53%的功能排队延迟。我们使用DGSF来增强具有GPU支持的AWS Lambda，显示出类似的好处。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS)

自引率

0.00%

发文量