A milestone for FaaS pipelines; object storage-vs VM-driven data exchange

Germán T. Eizaguirre, Marc Sánchez Artigas, P. López
{"title":"A milestone for FaaS pipelines; object storage-vs VM-driven data exchange","authors":"Germán T. Eizaguirre, Marc Sánchez Artigas, P. López","doi":"10.1145/3491086.3492472","DOIUrl":null,"url":null,"abstract":"Serverless functions provide high levels of parallelism, short startup times, and \"pay-as-you-go\" billing. These attributes make them a natural substrate for data analytics workflows. However, the impossibility of direct communication between functions makes the execution of workflows challenging. The current practice to share intermediate data among functions is through remote object storage (e.g., IBM COS). Contrary to conventional wisdom, the performance of object storage is not well understood. For instance, object storage can even be superior to other simpler approaches like the execution of shuffle stages (e.g., GroupBy) inside powerful VMs to avoid all-to-all transfers between functions. Leveraging a genomics pipeline, we show that object storage is a reasonable choice for data passing when the appropriate number of functions is used in shuffling stages.","PeriodicalId":246858,"journal":{"name":"Proceedings of the 22nd International Middleware Conference: Demos and Posters","volume":"17 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 22nd International Middleware Conference: Demos and Posters","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3491086.3492472","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Serverless functions provide high levels of parallelism, short startup times, and "pay-as-you-go" billing. These attributes make them a natural substrate for data analytics workflows. However, the impossibility of direct communication between functions makes the execution of workflows challenging. The current practice to share intermediate data among functions is through remote object storage (e.g., IBM COS). Contrary to conventional wisdom, the performance of object storage is not well understood. For instance, object storage can even be superior to other simpler approaches like the execution of shuffle stages (e.g., GroupBy) inside powerful VMs to avoid all-to-all transfers between functions. Leveraging a genomics pipeline, we show that object storage is a reasonable choice for data passing when the appropriate number of functions is used in shuffling stages.
FaaS管道的里程碑;对象存储与虚拟机驱动的数据交换
无服务器功能提供高水平的并行性、较短的启动时间和“随用随付”计费。这些属性使它们成为数据分析工作流的自然基础。然而,功能之间直接通信的不可能性使得工作流的执行具有挑战性。当前在函数之间共享中间数据的做法是通过远程对象存储(例如IBM COS)。与传统观点相反,对象存储的性能并没有得到很好的理解。例如,对象存储甚至可以优于其他更简单的方法,如在功能强大的vm内执行shuffle阶段(例如GroupBy),以避免功能之间的所有到所有传输。利用基因组学管道,我们表明对象存储是数据传递的合理选择,当在洗牌阶段使用适当数量的函数时。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信