ShardFS vs. IndexFS: replication vs. caching strategies for distributed metadata management in cloud storage systems

Proceedings of the Sixth ACM Symposium on Cloud Computing Pub Date : 2015-08-27 DOI:10.1145/2806777.2806844

Lin Xiao, Kai Ren, Qing Zheng, Garth A. Gibson

{"title":"ShardFS vs. IndexFS: replication vs. caching strategies for distributed metadata management in cloud storage systems","authors":"Lin Xiao, Kai Ren, Qing Zheng, Garth A. Gibson","doi":"10.1145/2806777.2806844","DOIUrl":null,"url":null,"abstract":"The rapid growth of cloud storage systems calls for fast and scalable namespace processing. While few commercial file systems offer anything better than federating individually non-scalable namespace servers, a recent academic file system, IndexFS, demonstrates scalable namespace processing based on client caching of directory entries and permissions (directory lookup state) with no per-client state in servers. In this paper we explore explicit replication of directory lookup state in all servers as an alternative to caching this information in all clients. Both eliminate most repeated RPCs to different servers in order to resolve hierarchical permission tests. Our realization for server replicated directory lookup state, ShardFS, employs a novel file system specific hybrid optimistic and pessimistic concurrency control favoring single object transactions over distributed transactions. Our experimentation suggests that if directory lookup state mutation is a fixed fraction of operations (strong scaling for metadata), server replication does not scale as well as client caching, but if directory lookup state mutation is proportional to the number of jobs, not the number of processes per job, (weak scaling for metadata), then server replication can scale more linearly than client caching and provide lower 70 percentile response times as well.","PeriodicalId":275158,"journal":{"name":"Proceedings of the Sixth ACM Symposium on Cloud Computing","volume":"3 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"36","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Sixth ACM Symposium on Cloud Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2806777.2806844","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 36

Abstract

The rapid growth of cloud storage systems calls for fast and scalable namespace processing. While few commercial file systems offer anything better than federating individually non-scalable namespace servers, a recent academic file system, IndexFS, demonstrates scalable namespace processing based on client caching of directory entries and permissions (directory lookup state) with no per-client state in servers. In this paper we explore explicit replication of directory lookup state in all servers as an alternative to caching this information in all clients. Both eliminate most repeated RPCs to different servers in order to resolve hierarchical permission tests. Our realization for server replicated directory lookup state, ShardFS, employs a novel file system specific hybrid optimistic and pessimistic concurrency control favoring single object transactions over distributed transactions. Our experimentation suggests that if directory lookup state mutation is a fixed fraction of operations (strong scaling for metadata), server replication does not scale as well as client caching, but if directory lookup state mutation is proportional to the number of jobs, not the number of processes per job, (weak scaling for metadata), then server replication can scale more linearly than client caching and provide lower 70 percentile response times as well.

查看原文本刊更多论文

ShardFS与IndexFS:云存储系统中分布式元数据管理的复制与缓存策略

云存储系统的快速增长需要快速和可扩展的命名空间处理。虽然很少有商业文件系统能提供比联合单个不可扩展的名称空间服务器更好的东西，但最近的一个学术文件系统IndexFS演示了基于目录条目和权限(目录查找状态)的客户端缓存的可扩展名称空间处理，而服务器中没有每个客户端状态。在本文中，我们将探索在所有服务器中显式复制目录查找状态，作为在所有客户端中缓存此信息的替代方案。两者都消除了对不同服务器的大多数重复rpc，以便解决分层权限测试。我们对服务器复制目录查找状态的实现ShardFS采用了一种新的特定于文件系统的混合乐观和悲观并发控制，它更倾向于单对象事务而不是分布式事务。我们的实验表明，如果目录查找状态的变化是操作的固定部分(元数据的可伸缩性强)，服务器复制的可伸缩性不如客户端缓存好，但是如果目录查找状态的变化与作业的数量成正比，而不是与每个作业的进程数成比例(元数据的可伸缩性弱)，那么服务器复制可以比客户端缓存更线性地扩展，并且提供更低的70%的响应时间。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the Sixth ACM Symposium on Cloud Computing

自引率

0.00%

发文量