Asgard: Are NoSQL databases suitable for ephemeral data in serverless workloads?

Karthick Shankar, Ashraf Y. Mahgoub, Zihan Zhou, Utkarsh Priyam, S. Chaterji
{"title":"Asgard: Are NoSQL databases suitable for ephemeral data in serverless workloads?","authors":"Karthick Shankar, Ashraf Y. Mahgoub, Zihan Zhou, Utkarsh Priyam, S. Chaterji","doi":"10.3389/fhpcp.2023.1127883","DOIUrl":null,"url":null,"abstract":"Serverless computing platforms are becoming increasingly popular for data analytics applications due to their low management overhead and granular billing strategies. Such analytics frameworks use a Directed Acyclic Graph (DAG) structure, in which serverless functions, which are fine-grained tasks, are represented as nodes and data-dependencies between the functions are represented as edges. Passing intermediate (ephemeral) data from one function to another has been receiving attention of late, with works proposing various storage systems and methods of optimization for them. The state-of-practice method is to pass the ephemeral data through remote storage, either disk-based (e.g., Amazon S3), which is slow, or memory-based (e.g., ElastiCache Redis), which is expensive. Despite the potential of some prominent NoSQL databases, like Apache Cassandra and ScyllaDB, which utilize both memory and disk, prevailing opinions suggest they are ill-suited for ephemeral data, being tailored more for long-term storage. In our study, titled Asgard, we rigorously examine this assumption. Using Amazon Web Services (AWS) as a testbed with two popular serverless applications, we explore scenarios like fanout and varying workloads, gauging the performance benefits of configuring NoSQL databases in a DAG-aware way. Surprisingly, we found that, per end-to-end latency normalized by $ cost, Apache Cassandra's default setup surpassed Redis by up to 326% and S3 by up to 189%. When optimized with Asgard, Cassandra outdid its own default configuration by up to 47%. This underscores specific instances where NoSQL databases can outshine the current state-of-practice.","PeriodicalId":399190,"journal":{"name":"Frontiers in High Performance Computing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in High Performance Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3389/fhpcp.2023.1127883","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Serverless computing platforms are becoming increasingly popular for data analytics applications due to their low management overhead and granular billing strategies. Such analytics frameworks use a Directed Acyclic Graph (DAG) structure, in which serverless functions, which are fine-grained tasks, are represented as nodes and data-dependencies between the functions are represented as edges. Passing intermediate (ephemeral) data from one function to another has been receiving attention of late, with works proposing various storage systems and methods of optimization for them. The state-of-practice method is to pass the ephemeral data through remote storage, either disk-based (e.g., Amazon S3), which is slow, or memory-based (e.g., ElastiCache Redis), which is expensive. Despite the potential of some prominent NoSQL databases, like Apache Cassandra and ScyllaDB, which utilize both memory and disk, prevailing opinions suggest they are ill-suited for ephemeral data, being tailored more for long-term storage. In our study, titled Asgard, we rigorously examine this assumption. Using Amazon Web Services (AWS) as a testbed with two popular serverless applications, we explore scenarios like fanout and varying workloads, gauging the performance benefits of configuring NoSQL databases in a DAG-aware way. Surprisingly, we found that, per end-to-end latency normalized by $ cost, Apache Cassandra's default setup surpassed Redis by up to 326% and S3 by up to 189%. When optimized with Asgard, Cassandra outdid its own default configuration by up to 47%. This underscores specific instances where NoSQL databases can outshine the current state-of-practice.
Asgard: NoSQL数据库适用于无服务器工作负载中的临时数据吗?
无服务器计算平台由于其低管理开销和细粒度计费策略,在数据分析应用程序中越来越受欢迎。这种分析框架使用有向无环图(DAG)结构,其中无服务器功能(细粒度任务)表示为节点,功能之间的数据依赖关系表示为边。将中间(短暂的)数据从一个函数传递到另一个函数最近受到了人们的关注,人们提出了各种存储系统和优化方法。实践状态的方法是通过远程存储传递临时数据,要么基于磁盘(例如,Amazon S3),这是缓慢的,要么基于内存(例如,ElastiCache Redis),这是昂贵的。尽管一些突出的NoSQL数据库(如Apache Cassandra和ScyllaDB)具有利用内存和磁盘的潜力,但普遍的观点认为它们不适合临时数据,更适合长期存储。在我们名为《阿斯加德》的研究中,我们严格检验了这一假设。我们使用Amazon Web Services (AWS)作为两个流行的无服务器应用程序的测试平台,探讨了fanout和不同工作负载等场景,以dag感知方式配置NoSQL数据库的性能优势。令人惊讶的是,我们发现,按$ cost标准化的端到端延迟,Apache Cassandra的默认设置比Redis高326%,比S3高189%。当使用Asgard进行优化时,Cassandra比自己的默认配置高出47%。这强调了NoSQL数据库可以超越当前实践状态的特定实例。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信