EDRFS: An Effective Distributed Replication File System for Small-File and Data-Intensive Application

2007 2nd International Conference on Communication Systems Software and Middleware Pub Date : 2007-07-09 DOI:10.1109/COMSWA.2007.382422

Bin Cai, C. Xie, Guangxi Zhu

{"title":"EDRFS: An Effective Distributed Replication File System for Small-File and Data-Intensive Application","authors":"Bin Cai, C. Xie, Guangxi Zhu","doi":"10.1109/COMSWA.2007.382422","DOIUrl":null,"url":null,"abstract":"With the system scale keeping grown, the key challenge is to mask the failures that arise among the system components and to improve the performance of data-intensive applications. This paper designs and implements a cluster-based distributed replication file system EDRFS to meet these critical demands. EDRFS works with a single metadata server and multiple storage nodes, deploys whole-file replication scheme at the file level, and tracks what storage node a file is replicated on. We use a linear hash algorithm to evenly distribute data and load across multiple storage nodes so as to achieve balancing workload and incremental scalability of throughput and storage capacity as the system scale grows. In addition, we employ metadata caches and file data caches in clients to enhance system performance. Furthermore, we deploy a concurrency lock scheme to avoid namespace operation bottleneck and a replicas consistency method to keep a consistent mutation order among replicas of a file. We provide the initial experimental evaluations of our prototypical system on a small-file and data-intensive workload.","PeriodicalId":191295,"journal":{"name":"2007 2nd International Conference on Communication Systems Software and Middleware","volume":"12 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2007-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2007 2nd International Conference on Communication Systems Software and Middleware","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/COMSWA.2007.382422","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 8

Abstract

With the system scale keeping grown, the key challenge is to mask the failures that arise among the system components and to improve the performance of data-intensive applications. This paper designs and implements a cluster-based distributed replication file system EDRFS to meet these critical demands. EDRFS works with a single metadata server and multiple storage nodes, deploys whole-file replication scheme at the file level, and tracks what storage node a file is replicated on. We use a linear hash algorithm to evenly distribute data and load across multiple storage nodes so as to achieve balancing workload and incremental scalability of throughput and storage capacity as the system scale grows. In addition, we employ metadata caches and file data caches in clients to enhance system performance. Furthermore, we deploy a concurrency lock scheme to avoid namespace operation bottleneck and a replicas consistency method to keep a consistent mutation order among replicas of a file. We provide the initial experimental evaluations of our prototypical system on a small-file and data-intensive workload.

查看原文本刊更多论文

EDRFS:适用于小文件和数据密集型应用的高效分布式复制文件系统

随着系统规模的不断扩大，关键的挑战是掩盖系统组件之间出现的故障，并提高数据密集型应用程序的性能。为了满足这些需求，本文设计并实现了一个基于集群的分布式复制文件系统EDRFS。EDRFS支持单个元数据服务器和多个存储节点，在文件级部署全文件复制方案，并跟踪文件被复制到哪个存储节点。我们使用线性哈希算法将数据和负载均匀分布在多个存储节点上，从而实现负载均衡，并随着系统规模的增长实现吞吐量和存储容量的增量可扩展性。此外，我们在客户端使用元数据缓存和文件数据缓存来提高系统性能。此外，我们部署了并发锁方案以避免命名空间操作瓶颈，并部署了副本一致性方法以保持文件副本之间的一致突变顺序。我们在一个小文件和数据密集型工作负载上对我们的原型系统进行了初步的实验评估。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2007 2nd International Conference on Communication Systems Software and Middleware

自引率

0.00%

发文量