NeIL: Intelligent Replica Selection for Distributed Applications

Faraz Ahmed;Lianjie Cao;Ayush Goel;Puneet Sharma
{"title":"NeIL: Intelligent Replica Selection for Distributed Applications","authors":"Faraz Ahmed;Lianjie Cao;Ayush Goel;Puneet Sharma","doi":"10.1109/TMLCN.2024.3479109","DOIUrl":null,"url":null,"abstract":"Distributed applications such as cloud gaming, streaming, etc., are increasingly using edge-to-cloud infrastructure for high availability and performance. While edge infrastructure brings services closer to the end-user, the number of sites on which the services need to be replicated has also increased. This makes replica selection challenging for clients of the replicated services. Traditional replica selection methods including anycast based methods and DNS re-directions are performance agnostic, and clients experience degraded network performance when network performance dynamics are not considered in replica selection. In this work, we present a client-side replica selection framework NeIL, that enables network performance aware replica selection. We propose to use bandits with experts based Multi-Armed Bandit (MAB) algorithms and adapt these algorithms for replica selection at individual clients without centralized coordination. We evaluate our approach using three different setups including a distributed Mininet setup where we use publicly available network performance data from the Measurement Lab (M-Lab) to emulate network conditions, a setup where we deploy replica servers on AWS, and finally we present results from a global enterprise deployment. Our experimental results show that in comparison to greedy selection, NeIL performs better than greedy for 45% of the time and better than or equal to greedy selection for 80% of the time resulting in a net gain in end-to-end network performance. On AWS, we see similar results where NeIL performs better than or equal to greedy for 75% of the time. We have successfully deployed NeIL in a global enterprise remote device management service with over 4000 client devices and our analysis shows that NeIL achieves significantly better tail service quality by cutting the \n<inline-formula> <tex-math>$99th$ </tex-math></inline-formula>\n percentile tail latency from 5.6 seconds to 1.7 seconds.","PeriodicalId":100641,"journal":{"name":"IEEE Transactions on Machine Learning in Communications and Networking","volume":"2 ","pages":"1580-1594"},"PeriodicalIF":0.0000,"publicationDate":"2024-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10714467","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Machine Learning in Communications and Networking","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10714467/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Distributed applications such as cloud gaming, streaming, etc., are increasingly using edge-to-cloud infrastructure for high availability and performance. While edge infrastructure brings services closer to the end-user, the number of sites on which the services need to be replicated has also increased. This makes replica selection challenging for clients of the replicated services. Traditional replica selection methods including anycast based methods and DNS re-directions are performance agnostic, and clients experience degraded network performance when network performance dynamics are not considered in replica selection. In this work, we present a client-side replica selection framework NeIL, that enables network performance aware replica selection. We propose to use bandits with experts based Multi-Armed Bandit (MAB) algorithms and adapt these algorithms for replica selection at individual clients without centralized coordination. We evaluate our approach using three different setups including a distributed Mininet setup where we use publicly available network performance data from the Measurement Lab (M-Lab) to emulate network conditions, a setup where we deploy replica servers on AWS, and finally we present results from a global enterprise deployment. Our experimental results show that in comparison to greedy selection, NeIL performs better than greedy for 45% of the time and better than or equal to greedy selection for 80% of the time resulting in a net gain in end-to-end network performance. On AWS, we see similar results where NeIL performs better than or equal to greedy for 75% of the time. We have successfully deployed NeIL in a global enterprise remote device management service with over 4000 client devices and our analysis shows that NeIL achieves significantly better tail service quality by cutting the $99th$ percentile tail latency from 5.6 seconds to 1.7 seconds.
NeIL:为分布式应用程序选择智能副本
云游戏、流媒体等分布式应用越来越多地使用边缘到云基础设施来实现高可用性和高性能。虽然边缘基础设施使服务更接近终端用户,但需要复制服务的站点数量也在增加。这就给复制服务的客户选择副本带来了挑战。传统的副本选择方法(包括基于任播的方法和 DNS 重定向)与性能无关,如果在选择副本时不考虑网络性能动态,客户端就会遇到网络性能下降的问题。在这项工作中,我们提出了一个客户端复制选择框架 NeIL,它能实现网络性能感知复制选择。我们建议使用基于专家的多臂匪徒(MAB)算法,并将这些算法调整用于单个客户端的副本选择,而无需集中协调。我们使用三种不同的设置来评估我们的方法,包括分布式 Mininet 设置(我们使用来自测量实验室(M-Lab)的公开可用网络性能数据来模拟网络条件)、在 AWS 上部署副本服务器的设置,最后我们展示全球企业部署的结果。我们的实验结果表明,与贪婪选择相比,NeIL 在 45% 的时间内表现优于贪婪选择,在 80% 的时间内表现优于或等于贪婪选择,从而实现了端到端网络性能的净增。在 AWS 上,我们也看到了类似的结果,NeIL 有 75% 的时间表现优于或等于贪婪选择。我们在一个拥有 4000 多台客户端设备的全球性企业远程设备管理服务中成功部署了 NeIL,我们的分析表明,NeIL 将第 99 个百分位数的尾部延迟从 5.6 秒降至 1.7 秒,从而显著提高了尾部服务质量。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信