HybridFS — A High Performance and Balanced File System Framework with Multiple Distributed File Systems

Lidong Zhang, Yongwei Wu, Ruini Xue, Tse-Chuan Hsu, Hongji Yang, Yeh-Ching Chung
{"title":"HybridFS — A High Performance and Balanced File System Framework with Multiple Distributed File Systems","authors":"Lidong Zhang, Yongwei Wu, Ruini Xue, Tse-Chuan Hsu, Hongji Yang, Yeh-Ching Chung","doi":"10.1109/COMPSAC.2017.140","DOIUrl":null,"url":null,"abstract":"In the big data era, the distributed file system is getting more and more significant due to the characteristics of its scale-out capability, high availability, and high performance. Different distributed file systems may have different design goals. For example, some of them are designed to have good performance for small file operations, such as GlusterFS, while some of them are designed for large file operations, such as Hadoop distributed file system. With the divergence of big data applications, a distributed file system may provide good performance for some applications but fails for some other applications, that is, there has no universal distributed file system that can produce good performance for all applications. In this paper, we propose a hybrid file system framework, HybridFS, which can deliver satisfactory performance for all applications. HybridFS is composed of multiple distributed file systems with the integration of advantages of these distributed file systems. In HybridFS, on top of multiple distributed file systems, we have designed a metadata management server to perform three functions: file placement, partial metadata store, and dynamic file migration. The file placement is performed based on a decision tree. The partial metadata store is performed for files whose size is less than a few hundred Bytes to increase throughput. The dynamic file migration is performed to balance the storage usage of distributed file systems without throttling performance. We have implemented HybridFS in java on eight nodes and choose Ceph, HDFS, and GlusterFS as designated distributed file systems. The experimental results show that, in the best case, HybridFS can have up to 30% performance improvement of read/write operations over a single distributed file system. In addition, if the difference of storage usage among multiple distributed file systems is less than 40%, the performance of HybridFS is guaranteed, that is, no performance degradation.","PeriodicalId":6556,"journal":{"name":"2017 IEEE 41st Annual Computer Software and Applications Conference (COMPSAC)","volume":"40 1","pages":"796-805"},"PeriodicalIF":0.0000,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE 41st Annual Computer Software and Applications Conference (COMPSAC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/COMPSAC.2017.140","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6

Abstract

In the big data era, the distributed file system is getting more and more significant due to the characteristics of its scale-out capability, high availability, and high performance. Different distributed file systems may have different design goals. For example, some of them are designed to have good performance for small file operations, such as GlusterFS, while some of them are designed for large file operations, such as Hadoop distributed file system. With the divergence of big data applications, a distributed file system may provide good performance for some applications but fails for some other applications, that is, there has no universal distributed file system that can produce good performance for all applications. In this paper, we propose a hybrid file system framework, HybridFS, which can deliver satisfactory performance for all applications. HybridFS is composed of multiple distributed file systems with the integration of advantages of these distributed file systems. In HybridFS, on top of multiple distributed file systems, we have designed a metadata management server to perform three functions: file placement, partial metadata store, and dynamic file migration. The file placement is performed based on a decision tree. The partial metadata store is performed for files whose size is less than a few hundred Bytes to increase throughput. The dynamic file migration is performed to balance the storage usage of distributed file systems without throttling performance. We have implemented HybridFS in java on eight nodes and choose Ceph, HDFS, and GlusterFS as designated distributed file systems. The experimental results show that, in the best case, HybridFS can have up to 30% performance improvement of read/write operations over a single distributed file system. In addition, if the difference of storage usage among multiple distributed file systems is less than 40%, the performance of HybridFS is guaranteed, that is, no performance degradation.
HybridFS -具有多个分布式文件系统的高性能均衡文件系统框架
在大数据时代,分布式文件系统以其横向扩展能力、高可用性和高性能的特点,越来越受到人们的重视。不同的分布式文件系统可能有不同的设计目标。例如,有些是为小文件操作而设计的,比如GlusterFS,而有些是为大文件操作而设计的,比如Hadoop分布式文件系统。随着大数据应用的分化,一个分布式文件系统可能对某些应用提供良好的性能,但对另一些应用却不能提供良好的性能,即没有一个通用的分布式文件系统可以为所有应用提供良好的性能。在本文中,我们提出了一个混合文件系统框架,HybridFS,它可以为所有应用程序提供令人满意的性能。HybridFS是由多个分布式文件系统组成,并综合了这些分布式文件系统的优点。在HybridFS中,在多个分布式文件系统之上,我们设计了一个元数据管理服务器来执行三个功能:文件放置、部分元数据存储和动态文件迁移。文件放置是基于决策树执行的。对小于几百字节的文件进行部分元数据存储,以提高吞吐量。动态文件迁移是为了平衡分布式文件系统的存储使用,而不影响性能。我们在8个节点上用java实现了HybridFS,并选择Ceph、HDFS和GlusterFS作为指定的分布式文件系统。实验结果表明,在最好的情况下,在单个分布式文件系统上,HybridFS可以将读/写操作的性能提高30%。此外,如果多个分布式文件系统之间的存储使用差异小于40%,则可以保证HybridFS的性能,即不降低性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信