A milestone for FaaS pipelines; object storage-vs VM-driven data exchange

Proceedings of the 22nd International Middleware Conference: Demos and Posters Pub Date : 2021-12-06 DOI:10.1145/3491086.3492472

Germán T. Eizaguirre, Marc Sánchez Artigas, P. López

引用次数: 0

Abstract

Serverless functions provide high levels of parallelism, short startup times, and "pay-as-you-go" billing. These attributes make them a natural substrate for data analytics workflows. However, the impossibility of direct communication between functions makes the execution of workflows challenging. The current practice to share intermediate data among functions is through remote object storage (e.g., IBM COS). Contrary to conventional wisdom, the performance of object storage is not well understood. For instance, object storage can even be superior to other simpler approaches like the execution of shuffle stages (e.g., GroupBy) inside powerful VMs to avoid all-to-all transfers between functions. Leveraging a genomics pipeline, we show that object storage is a reasonable choice for data passing when the appropriate number of functions is used in shuffling stages.

查看原文本刊更多论文

FaaS管道的里程碑;对象存储与虚拟机驱动的数据交换

无服务器功能提供高水平的并行性、较短的启动时间和“随用随付”计费。这些属性使它们成为数据分析工作流的自然基础。然而，功能之间直接通信的不可能性使得工作流的执行具有挑战性。当前在函数之间共享中间数据的做法是通过远程对象存储(例如IBM COS)。与传统观点相反，对象存储的性能并没有得到很好的理解。例如，对象存储甚至可以优于其他更简单的方法，如在功能强大的vm内执行shuffle阶段(例如GroupBy)，以避免功能之间的所有到所有传输。利用基因组学管道，我们表明对象存储是数据传递的合理选择，当在洗牌阶段使用适当数量的函数时。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 22nd International Middleware Conference: Demos and Posters

自引率

0.00%

发文量