Duo: Improving Data Sharing of Stateful Serverless Applications by Efficiently Caching Multi-Read Data

Zhuo Huang, Haoqiang Fan, Chaoyi Cheng, Song Wu, Hai Jin
{"title":"Duo: Improving Data Sharing of Stateful Serverless Applications by Efficiently Caching Multi-Read Data","authors":"Zhuo Huang, Haoqiang Fan, Chaoyi Cheng, Song Wu, Hai Jin","doi":"10.1109/IPDPS54959.2023.00092","DOIUrl":null,"url":null,"abstract":"A growing number of applications are moving to serverless architectures for high elasticity and fine-grained billing. For stateful applications, however, the use of serverless architectures is likely to lead to significant performance degradation, as frequent data sharing between different execution stages involves time-consuming remote storage access. Current platforms leverage memory cache to speed up remote access. However, conventional caching strategies show limited performance improvement. We experimentally find that the reason is that current strategies overlook the stage-dependent access patterns of stateful serverless applications, i.e., data that are read multiple times across stages (denoted as multi-read data) are wrongly evicted by data that are read only once (denoted as read-once data), causing a high cache miss ratio.Accordingly, we propose a new caching strategy, Duo, whose design principle is to cache multi-read data as long as possible. Specifically, Duo contains a large cache list and a small cache list, which act as Leader list and Wingman list, respectively. Leader list ignores the data that is read for the first time to prevent itself from being polluted by massive read-once data at each stage. Wingman list inspects the data that are ignored or evicted by Leader list, and pre-fetches the data that will probably be read again based on the observation that multi-read data usually appear periodically in groups. Compared to the state-of-the-art works, Duo improves hit ratio by 1.1×-2.1× and reduces the data sharing overhead by 25%-62%.","PeriodicalId":343684,"journal":{"name":"2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPDPS54959.2023.00092","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

A growing number of applications are moving to serverless architectures for high elasticity and fine-grained billing. For stateful applications, however, the use of serverless architectures is likely to lead to significant performance degradation, as frequent data sharing between different execution stages involves time-consuming remote storage access. Current platforms leverage memory cache to speed up remote access. However, conventional caching strategies show limited performance improvement. We experimentally find that the reason is that current strategies overlook the stage-dependent access patterns of stateful serverless applications, i.e., data that are read multiple times across stages (denoted as multi-read data) are wrongly evicted by data that are read only once (denoted as read-once data), causing a high cache miss ratio.Accordingly, we propose a new caching strategy, Duo, whose design principle is to cache multi-read data as long as possible. Specifically, Duo contains a large cache list and a small cache list, which act as Leader list and Wingman list, respectively. Leader list ignores the data that is read for the first time to prevent itself from being polluted by massive read-once data at each stage. Wingman list inspects the data that are ignored or evicted by Leader list, and pre-fetches the data that will probably be read again based on the observation that multi-read data usually appear periodically in groups. Compared to the state-of-the-art works, Duo improves hit ratio by 1.1×-2.1× and reduces the data sharing overhead by 25%-62%.
通过高效缓存多读数据来改善有状态无服务器应用程序的数据共享
越来越多的应用程序正在转向无服务器架构,以实现高弹性和细粒度计费。然而,对于有状态应用程序,使用无服务器架构可能会导致显著的性能下降,因为在不同执行阶段之间频繁的数据共享涉及耗时的远程存储访问。当前的平台利用内存缓存来加速远程访问。然而,传统的缓存策略显示出有限的性能改进。我们通过实验发现,原因是当前的策略忽略了有状态无服务器应用程序的阶段相关访问模式,即跨阶段读取多次的数据(表示为多次读取数据)被仅读取一次的数据(表示为一次读取数据)错误地驱逐,导致高缓存缺失率。因此,我们提出了一种新的缓存策略Duo,其设计原则是尽可能长时间地缓存多次读取的数据。具体来说,Duo包含一个大缓存列表和一个小缓存列表,它们分别充当Leader列表和Wingman列表。Leader list会忽略第一次读取的数据,以防止每个阶段大量的只读数据污染Leader list。Wingman列表检查被Leader列表忽略或删除的数据,并根据观察到多次读取的数据通常周期性地出现在组中,预取可能再次读取的数据。与最先进的作品相比,Duo提高了1.1×-2.1×命中率,并减少了25%-62%的数据共享开销。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信