Shard Manager: A Generic Shard Management Framework for Geo-distributed Applications

Q3 Computer Science
Sangmin Lee, Zhenhua Guo, Omer Sunercan, Jun Ying, Thawan Kooburat, Suryadeep Biswal, Jun Chen, K. Huang, Yatpang Cheung, Yiding Zhou, K. Veeraraghavan, Biren Damani, Pol Mauri Ruiz, V. Mehta, Chunqiang Tang
{"title":"Shard Manager: A Generic Shard Management Framework for Geo-distributed Applications","authors":"Sangmin Lee, Zhenhua Guo, Omer Sunercan, Jun Ying, Thawan Kooburat, Suryadeep Biswal, Jun Chen, K. Huang, Yatpang Cheung, Yiding Zhou, K. Veeraraghavan, Biren Damani, Pol Mauri Ruiz, V. Mehta, Chunqiang Tang","doi":"10.1145/3477132.3483546","DOIUrl":null,"url":null,"abstract":"Sharding is widely used to scale an application. Despite a decade of effort to build generic sharding frameworks that can be reused across different applications, the extent of their success remains unclear. We attempt to answer a fundamental question: what barriers prevent a sharding framework from getting adopted by the majority of sharded applications? We analyze hundreds of sharded applications at Facebook and identify two major barriers: 1) lack of support for geo-distributed applications, which account for most of Facebook's applications, and 2) inability to maintain application availability during planned events such as software upgrades, which happen ≈1000 times more frequently than unplanned failures. A sharding framework that does not help applications to address these fundamental challenges is not sufficiently attractive for most applications to adopt it. Other adoption barriers include the burden of supporting many complex applications in a one-size-fit-all sharding framework and the difficulty in supporting sophisticated shard-placement requirements. Theoretically, a constraint solver can handle complex placement requirements, but in practice it is not scalable enough to perform near-realtime shard placement at a global scale. We have overcome these adoption barriers in Facebook's sharding framework called Shard Manager. Currently, Shard Manager is used by hundreds of applications running on over one million machines, which account for about 54% of all sharded applications at Facebook.","PeriodicalId":38935,"journal":{"name":"Operating Systems Review (ACM)","volume":"25 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2021-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"15","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Operating Systems Review (ACM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3477132.3483546","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Computer Science","Score":null,"Total":0}
引用次数: 15

Abstract

Sharding is widely used to scale an application. Despite a decade of effort to build generic sharding frameworks that can be reused across different applications, the extent of their success remains unclear. We attempt to answer a fundamental question: what barriers prevent a sharding framework from getting adopted by the majority of sharded applications? We analyze hundreds of sharded applications at Facebook and identify two major barriers: 1) lack of support for geo-distributed applications, which account for most of Facebook's applications, and 2) inability to maintain application availability during planned events such as software upgrades, which happen ≈1000 times more frequently than unplanned failures. A sharding framework that does not help applications to address these fundamental challenges is not sufficiently attractive for most applications to adopt it. Other adoption barriers include the burden of supporting many complex applications in a one-size-fit-all sharding framework and the difficulty in supporting sophisticated shard-placement requirements. Theoretically, a constraint solver can handle complex placement requirements, but in practice it is not scalable enough to perform near-realtime shard placement at a global scale. We have overcome these adoption barriers in Facebook's sharding framework called Shard Manager. Currently, Shard Manager is used by hundreds of applications running on over one million machines, which account for about 54% of all sharded applications at Facebook.
Shard Manager:一个用于地理分布式应用程序的通用Shard管理框架
分片被广泛用于扩展应用程序。尽管十年来人们一直在努力构建通用的分片框架,这些框架可以在不同的应用程序中重用,但它们的成功程度仍然不清楚。我们试图回答一个基本问题:是什么阻碍了分片框架被大多数分片应用程序采用?我们分析了Facebook上数百个分片应用程序,并确定了两个主要障碍:1)缺乏对地理分布式应用程序的支持,这占Facebook应用程序的大部分;2)在软件升级等计划事件期间无法维持应用程序的可用性,这比计划外故障发生的频率高出约1000倍。不能帮助应用程序解决这些基本挑战的分片框架对大多数应用程序来说没有足够的吸引力。其他采用障碍包括在一个通用的分片框架中支持许多复杂应用程序的负担,以及支持复杂的分片放置需求的困难。理论上,约束求解器可以处理复杂的放置需求,但在实践中,它的可扩展性不足以在全局范围内执行近实时的分片放置。我们已经在Facebook的分片框架Shard Manager中克服了这些采用障碍。目前,在超过100万台机器上运行的数百个应用程序使用了Shard Manager,这些机器约占Facebook所有分片应用程序的54%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Operating Systems Review (ACM)
Operating Systems Review (ACM) Computer Science-Computer Networks and Communications
CiteScore
2.80
自引率
0.00%
发文量
10
期刊介绍: Operating Systems Review (OSR) is a publication of the ACM Special Interest Group on Operating Systems (SIGOPS), whose scope of interest includes: computer operating systems and architecture for multiprogramming, multiprocessing, and time sharing; resource management; evaluation and simulation; reliability, integrity, and security of data; communications among computing processors; and computer system modeling and analysis.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信